Benchmarks v2 by podocarp · Pull Request #60 · Use-Tusk/drift-python-sdk

podocarp · 2026-02-02T10:01:36Z

Summary:

Now all of our integration/e2e tests can be run as benchmarks.
There is a new script ./run-all-benchmarks.sh that behaves like the ./run-all-e2e-tests.sh
Furthermore you can either call the script with the -f arg to specify which instrumentation to run, or you can go into the e2e folder under the instrumentation and run BENCHMARKS=1 ./run.sh.
This PR hijacks the make_request function to check for the env var, and if it's set it will fork out to benchmarking code (among other things)
Tests are unbearably long. The default config is 10s duration, 3s warmup per endpoint. The warmup I've discovered is very essential to start up any system caches etc. otherwise we get nonsensical results like the SDK being faster enabled than disabled.

The problem with this vs. structured benchmarks is that there's no standardized test suite so we can't interpret the results easily without context on the instrumentation and how the test is written. But it should make it very extensible in the future -- just add more tests, the benchmarks will automatically follow.

Note: (~) indicates low iterations (unreliable). Negative diff = slower with SDK.

aiohttp

Endpoint	Baseline	With SDK	Diff
GET /health	2548.47/s	753.89/s	-70.4%
GET /api/get-json	13.05/s	12.16/s	-6.8%
GET /api/get-with-params	13.02/s	12.28/s	-5.7%
GET /api/get-with-headers	13.73/s	11.34/s	-17.4%
GET /api/chain	7.50/s	6.92/s	-7.7%
GET /api/parallel	9.20/s	8.71/s	-5.3%
GET /test/streaming	12.47/s	11.57/s	-7.2%
GET /test/timeout	13.40/s	12.08/s	-9.9%
PUT /api/put-json	2.95/s	2.99/s	+1.4%

django

Endpoint	Baseline	With SDK	Diff
GET /health	3302.22/s	812.16/s	-75.4%
GET /api/csrf-form	3134.84/s	676.84/s	-78.4%
GET /api/post/1	10.44/s	9.97/s	-4.5%
GET /api/user/test123	2.54/s	2.51/s	-1.2% (~)
GET /api/weather	1.02/s	0.96/s	-5.9% (~)
POST /api/post	2.63/s	2.56/s	-2.7% (~)
DELETE /api/post/1/delete	2.93/s	2.71/s	-7.5% (~)

fastapi

Endpoint	Baseline	With SDK	Diff
GET /health	3654.35/s	807.29/s	-77.9%
GET /api/post/1	7.74/s	10.61/s	+37.1%
GET /api/activity	2.47/s	2.35/s	-4.9% (~)
GET /api/user/test123	2.56/s	2.03/s	-20.7% (~)
GET /api/weather	0.99/s	0.90/s	-9.1% (~)
POST /api/post	2.89/s	2.59/s	-10.4% (~)
DELETE /api/post/1	2.63/s	2.87/s	+9.1% (~)

flask

Endpoint	Baseline	With SDK	Diff
GET /health	2551.63/s	693.08/s	-72.8%
GET /api/post/1	10.79/s	10.00/s	-7.3%
GET /api/user/test123	2.46/s	2.42/s	-1.6% (~)
GET /api/weather-activity	0.57/s	0.59/s	+3.5% (~)
POST /api/post	2.37/s	2.42/s	+2.1% (~)
POST /api/user	2.25/s	2.37/s	+5.3% (~)
DELETE /api/post/1	2.92/s	2.61/s	-10.6% (~)

grpc

Endpoint	Baseline	With SDK	Diff
GET /health	2553.36/s	733.84/s	-71.3%
GET /api/greet?name=TestUser	1871.13/s	444.69/s	-76.2%
GET /api/greet?name=AnotherUser	1875.43/s	444.96/s	-76.3%
POST /api/greet-with-info	1804.09/s	402.96/s	-77.7%
GET /api/greet-chain	1410.86/s	206.30/s	-85.4%
GET /api/greet-with-call	1889.44/s	442.44/s	-76.6%
GET /test/future-call	1632.54/s	416.79/s	-74.5%
GET /test/stream-unary	1287.08/s	283.99/s	-77.9%
GET /test/stream-stream	5.97/s	5.55/s	-7.0%

httpx

Endpoint	Baseline	With SDK	Diff
GET /health	2590.27/s	746.12/s	-71.2%
GET /api/sync/get-json	10.73/s	7.95/s	-25.9%
GET /api/sync/get-with-params	9.45/s	8.37/s	-11.4%
GET /api/sync/get-with-headers	8.89/s	8.05/s	-9.4%
GET /api/sync/chain	4.38/s	5.36/s	+22.4%
GET /api/async/get-json	8.68/s	7.89/s	-9.1%
GET /api/async/get-with-params	8.45/s	7.82/s	-7.5%
GET /api/async/chain	6.62/s	4.52/s	-31.7%
GET /api/async/parallel	7.19/s	7.78/s	+8.2%
GET /test/async-send	8.97/s	9.56/s	+6.6%
GET /test/async-stream	8.69/s	9.96/s	+14.6%
GET /test/streaming	9.14/s	8.31/s	-9.1%
GET /test/toplevel-stream	9.20/s	10.74/s	+16.7%

psycopg (psycopg3)

Endpoint	Baseline	With SDK	Diff
GET /health	2576.48/s	739.02/s	-71.3%
GET /db/query	733.02/s	230.57/s	-68.5%
GET /test/cursor-stream	690.21/s	282.39/s	-59.1%
POST /db/transaction	657.52/s	270.67/s	-58.8%

psycopg2

Endpoint	Baseline	With SDK	Diff
GET /health	2551.87/s	725.39/s	-71.6%
GET /db/register-jsonb	914.30/s	532.20/s	-41.8%
GET /db/query	678.14/s	242.88/s	-64.2%
POST /db/insert	650.75/s	467.80/s	-28.1%

redis

Endpoint	Baseline	With SDK	Diff
GET /health	2569.44/s	748.55/s	-70.9%
POST /redis/set	2220.36/s	537.95/s	-75.8%
GET /redis/get/test_key	2350.11/s	579.62/s	-75.3%
GET /redis/get/test_key_expiry	2351.98/s	584.51/s	-75.1%
POST /redis/incr/counter	2354.10/s	586.23/s	-75.1%
GET /redis/keys/*	2364.17/s	554.42/s	-76.5%
DELETE /redis/delete/test_key	2346.24/s	584.40/s	-75.1%
GET /test/mget-mset	2137.24/s	327.05/s	-84.7%
GET /test/pipeline-basic	2147.54/s	388.45/s	-81.9%
GET /test/transaction-watch	2004.84/s	262.95/s	-86.9%
GET /test/async-pipeline	1040.48/s	252.89/s	-75.7%
GET /test/binary-data	1023.66/s	323.36/s	-68.4%

requests

Endpoint	Baseline	With SDK	Diff
GET /health	2525.20/s	739.19/s	-70.7%
GET /api/get-json	9.45/s	4.34/s	-54.1%
GET /api/get-with-params	9.96/s	4.35/s	-56.3%
GET /api/chain	2.91/s	1.31/s	-55.0% (~)
GET /api/parallel	7.63/s	3.93/s	-48.5%
GET /api/with-timeout	8.57/s	5.00/s	-41.7%
GET /test/session-send-direct	7.68/s	5.74/s	-25.3%

urllib

Endpoint	Baseline	With SDK	Diff
GET /health	2559.23/s	742.94/s	-71.0%
GET /api/get-json	8.55/s	6.14/s	-28.2%
GET /api/get-with-params	7.39/s	4.59/s	-37.9%
GET /api/get-with-request-object	6.57/s	6.56/s	-0.2%
GET /api/custom-opener	7.22/s	4.99/s	-30.9%
GET /api/with-timeout	8.30/s	5.97/s	-28.1%
GET /api/parallel	6.24/s	5.84/s	-6.4%
GET /test/getheader	7.57/s	3.96/s	-47.7%
GET /test/getcode	7.04/s	3.79/s	-46.2%
GET /test/head-request	7.92/s	4.77/s	-39.8%
GET /test/no-context-manager	6.53/s	4.19/s	-35.8%
GET /test/ssl-context	5.90/s	4.17/s	-29.3%

urllib3

Endpoint	Baseline	With SDK	Diff
GET /health	2551.05/s	759.58/s	-70.2%
GET /api/poolmanager/get-json	17.03/s	16.75/s	-1.6%
GET /api/poolmanager/get-with-params	18.87/s	15.93/s	-15.6%
GET /api/poolmanager/get-with-headers	22.45/s	17.08/s	-23.9%
GET /api/poolmanager/chain	7.19/s	5.88/s	-18.2%
GET /api/connectionpool/get-json	4.79/s	4.66/s	-2.7%
GET /test/timeout	18.07/s	17.38/s	-3.8%
GET /test/retries	18.55/s	18.11/s	-2.4%
GET /test/new-poolmanager	5.69/s	4.75/s	-16.5%
GET /test/multiple-requests	8.93/s	6.95/s	-22.2%
GET /test/requests-lib	5.92/s	5.52/s	-6.8%
POST /api/poolmanager/post-form	2.90/s	3.13/s	+7.9%
DELETE /api/poolmanager/delete	2.84/s	3.19/s	+12.3% (~)

cubic-dev-ai

2 issues found across 8 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="benchmarks2/benchmark.py">

<violation number="1" location="benchmarks2/benchmark.py:41">
P2: Baseline parsing reads the "ops/s" label instead of the numeric ops/s value, so `parse_results` will fail on valid benchmark output. Use `parts[4]` for the numeric ops/s token.</violation>

<violation number="2" location="benchmarks2/benchmark.py:113">
P3: The error return code from `main()` is ignored, so the script exits successfully even when the server is unreachable. Propagate the exit status from `main()`.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

benchmarks2/benchmark.py

tusk-dev · 2026-02-03T11:37:43Z

@podocarp does not have an active Tusk seat. Activate it before triggering test generation.

cubic-dev-ai

1 issue found across 40 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="drift/instrumentation/e2e_common/test_utils.py">

<violation number="1" location="drift/instrumentation/e2e_common/test_utils.py:89">
P2: Guard against zero iterations before computing per-op stats; BENCHMARK_DURATION=0 (or a too-short duration) leaves iterations at 0 and causes a ZeroDivisionError here.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

drift/instrumentation/e2e_common/test_utils.py

sohil-kshirsagar

nice, ty for this. can we update docs in this repo to reflect how to use this? maybe a new doc on benchmarks specifically is best, and if there are any docs on how to write e2e tests, make sure to indicate how they must be written to support benchmarks.

looks like some lint/type failures as well

podocarp · 2026-02-06T08:33:39Z

Added benchmarks.md. Writing benchmarks is really easy, since we are modifying the helper functions to do our benchmarking, as long as you add endpoints/test suites in the same format as the rest, it will auto discover new test cases.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="BENCHMARKS.md">

<violation number="1" location="BENCHMARKS.md:12">
P2: The benchmarking guide points to `./run-all-e2e-tests.sh`, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-06T08:37:30Z

BENCHMARKS.md

+
+The simplest way to get started it simply
+```
+./run-all-e2e-tests.sh


P2: The benchmarking guide points to ./run-all-e2e-tests.sh, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At BENCHMARKS.md, line 12: <comment>The benchmarking guide points to `./run-all-e2e-tests.sh`, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.</comment> <file context> @@ -0,0 +1,47 @@ + +The simplest way to get started it simply +``` +./run-all-e2e-tests.sh +``` +or to run a single (or a few) instrumentations, </file context>

jy-tan · 2026-02-06T17:47:55Z

@podocarp what do you think about running and publishing benchmarks only from stack tests? it doesn't hurt for them to be in individual instrumentations should we need to optimize something, but stack tests would be more meaningful and slightly more representative of actual apps than testing say just redis in isolation.

stack tests then fulfill 3 purposes:

ensure correctness in instrumentation interactions
realistic benchmarks
(future) serve as stack-based demo apps, effectively replacing the current python demo repo

add benchmarks2

a72f693

cubic-dev-ai bot reviewed Feb 2, 2026

View reviewed changes

benchmarks2/benchmark.py Outdated Show resolved Hide resolved

benchmarks2/benchmark.py Outdated Show resolved Hide resolved

podocarp added the Tusk - Generate Tests Trigger Tusk for this PR (even if merged) label Feb 3, 2026

tusk-dev bot removed the Tusk - Generate Tests Trigger Tusk for this PR (even if merged) label Feb 3, 2026

podocarp added 2 commits February 5, 2026 16:28

add some env vars

360725d

add script to run all benchmarks

b268dca

cubic-dev-ai bot reviewed Feb 5, 2026

View reviewed changes

drift/instrumentation/e2e_common/test_utils.py Show resolved Hide resolved

sohil-kshirsagar reviewed Feb 6, 2026

View reviewed changes

podocarp added 2 commits February 6, 2026 16:11

add uv to nix and direnv, run auto format

a2e612b

add benchmarks.md

061c2d4

podocarp requested review from jy-tan and sohil-kshirsagar February 6, 2026 08:33

cubic-dev-ai bot reviewed Feb 6, 2026

View reviewed changes

podocarp mentioned this pull request Feb 8, 2026

Benchmarks v2 Use-Tusk/drift-node-sdk#118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks v2#60

Benchmarks v2#60
podocarp wants to merge 5 commits intomainfrom
jxd/benchmarks

podocarp commented Feb 2, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

tusk-dev bot commented Feb 3, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

sohil-kshirsagar left a comment

Uh oh!

podocarp commented Feb 6, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Feb 6, 2026 •

edited

Loading

Uh oh!

jy-tan commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

podocarp commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

aiohttp

django

fastapi

flask

grpc

httpx

psycopg (psycopg3)

psycopg2

redis

requests

urllib

urllib3

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tusk-dev bot commented Feb 3, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sohil-kshirsagar left a comment

Choose a reason for hiding this comment

Uh oh!

podocarp commented Feb 6, 2026

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jy-tan commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

podocarp commented Feb 2, 2026 •

edited

Loading

cubic-dev-ai bot Feb 6, 2026 •

edited

Loading