Conversation
There was a problem hiding this comment.
2 issues found across 8 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="benchmarks2/benchmark.py">
<violation number="1" location="benchmarks2/benchmark.py:41">
P2: Baseline parsing reads the "ops/s" label instead of the numeric ops/s value, so `parse_results` will fail on valid benchmark output. Use `parts[4]` for the numeric ops/s token.</violation>
<violation number="2" location="benchmarks2/benchmark.py:113">
P3: The error return code from `main()` is ignored, so the script exits successfully even when the server is unreachable. Propagate the exit status from `main()`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
@podocarp does not have an active Tusk seat. Activate it before triggering test generation. |
There was a problem hiding this comment.
1 issue found across 40 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="drift/instrumentation/e2e_common/test_utils.py">
<violation number="1" location="drift/instrumentation/e2e_common/test_utils.py:89">
P2: Guard against zero iterations before computing per-op stats; BENCHMARK_DURATION=0 (or a too-short duration) leaves iterations at 0 and causes a ZeroDivisionError here.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
sohil-kshirsagar
left a comment
There was a problem hiding this comment.
nice, ty for this. can we update docs in this repo to reflect how to use this? maybe a new doc on benchmarks specifically is best, and if there are any docs on how to write e2e tests, make sure to indicate how they must be written to support benchmarks.
looks like some lint/type failures as well
|
Added benchmarks.md. Writing benchmarks is really easy, since we are modifying the helper functions to do our benchmarking, as long as you add endpoints/test suites in the same format as the rest, it will auto discover new test cases. |
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="BENCHMARKS.md">
<violation number="1" location="BENCHMARKS.md:12">
P2: The benchmarking guide points to `./run-all-e2e-tests.sh`, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
|
||
| The simplest way to get started it simply | ||
| ``` | ||
| ./run-all-e2e-tests.sh |
There was a problem hiding this comment.
P2: The benchmarking guide points to ./run-all-e2e-tests.sh, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At BENCHMARKS.md, line 12:
<comment>The benchmarking guide points to `./run-all-e2e-tests.sh`, which only runs E2E/stack tests and doesn't enable benchmark mode. This will mislead users who follow the benchmarking instructions. Use the benchmark runner script instead.</comment>
<file context>
@@ -0,0 +1,47 @@
+
+The simplest way to get started it simply
+```
+./run-all-e2e-tests.sh
+```
+or to run a single (or a few) instrumentations,
</file context>
|
@podocarp what do you think about running and publishing benchmarks only from stack tests? it doesn't hurt for them to be in individual instrumentations should we need to optimize something, but stack tests would be more meaningful and slightly more representative of actual apps than testing say just redis in isolation. stack tests then fulfill 3 purposes:
|
Summary:
BENCHMARKS=1 ./run.sh.make_requestfunction to check for the env var, and if it's set it will fork out to benchmarking code (among other things)The problem with this vs. structured benchmarks is that there's no standardized test suite so we can't interpret the results easily without context on the instrumentation and how the test is written. But it should make it very extensible in the future -- just add more tests, the benchmarks will automatically follow.
Note:
(~)indicates low iterations (unreliable). Negative diff = slower with SDK.aiohttp
django
fastapi
flask
grpc
httpx
psycopg (psycopg3)
psycopg2
redis
requests
urllib
urllib3