feat: accuracy issuer inherits perf concurrency in online mode#357
Conversation
When the performance phase runs the CONCURRENCY load pattern (online), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT, so evaluation exercises the endpoint the same way as the performance run. All other patterns are unchanged: POISSON and offline MAX_THROUGHPUT perf phases keep the accuracy phase at MAX_THROUGHPUT, since inheriting POISSON would silently rate-limit evaluation to the perf QPS (no accuracy QPS-budgeting yet). The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode. Also logs the accuracy issuer's chosen load mode (pattern + target_concurrency) per accuracy dataset. Adds unit tests for the concurrency-inheritance, POISSON-stays-max-throughput, offline-stays-max-throughput, and logging cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request updates the benchmark execution logic so that the accuracy phase mirrors the fixed concurrency of the performance phase when running in CONCURRENCY mode, while keeping MAX_THROUGHPUT for other load patterns. It also adds corresponding unit tests and logging. Feedback was provided to remove an unnecessary LoadPattern | None type annotation on acc_load_pattern to avoid redundant type widening and let type inference deduce the non-nullable LoadPattern type.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| perf_lp = ctx.rt_settings.load_pattern | ||
| acc_load_pattern: LoadPattern | None | ||
| if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY: | ||
| acc_load_pattern = LoadPattern( | ||
| type=LoadPatternType.CONCURRENCY, | ||
| target_concurrency=perf_lp.target_concurrency, | ||
| ) | ||
| else: | ||
| acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT) |
There was a problem hiding this comment.
The explicit type annotation acc_load_pattern: LoadPattern | None unnecessarily widens the type of acc_load_pattern to include None, even though it is guaranteed to be initialized as a LoadPattern in both branches of the if-else block. This can lead to unnecessary type-narrowing checks or static analysis warnings (e.g., from mypy or pyright) when accessing attributes like acc_load_pattern.type later in the function.\n\nWe can safely remove the explicit type annotation and let type inference deduce the correct non-nullable LoadPattern type.
perf_lp = ctx.rt_settings.load_pattern\n if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY:\n acc_load_pattern = LoadPattern(\n type=LoadPatternType.CONCURRENCY,\n target_concurrency=perf_lp.target_concurrency,\n )\n else:\n acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT)
What
When the performance phase runs the CONCURRENCY load pattern (online mode), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at
MAX_THROUGHPUT. This makes accuracy evaluation exercise the endpoint the same way the performance run does.Behavior
concurrency(online)max_throughputconcurrency(sametarget_concurrency)poisson(online)max_throughputmax_throughput(unchanged)max_throughput(offline)max_throughputmax_throughput(unchanged)POISSONand offlineMAX_THROUGHPUTdeliberately keep the accuracy phase atMAX_THROUGHPUT— inheritingPOISSONwould silently rate-limit evaluation to the perf QPS, and there's no accuracy QPS-budgeting yet. The gate is purelyload_pattern.type == CONCURRENCY, which the schema already constrains to online mode (schema.py), so no separate test-type check is needed.The accuracy issuer also now logs its chosen load mode per accuracy dataset, e.g.:
Scope
_build_phases()incommands/benchmark/execute.py(runs once at setup — no hot-path impact).Tests
Added to the existing
TestBuildPhasessuite intests/unit/commands/test_benchmark.py:test_accuracy_phase_inherits_perf_concurrency— perfconcurrency(7)→ accuracyconcurrency,target_concurrency == 7test_accuracy_phase_max_throughput_when_perf_poisson— perfpoisson→ accuracy staysmax_throughputtest_accuracy_phase_max_throughput_when_perf_offline— offline → accuracy staysmax_throughputtest_accuracy_issuer_logs_load_mode— asserts the issuer logs its modeVerification:
tests/unit/commands+tests/unit/load_generator→ 266 passed;pre-commit(ruff, ruff-format, mypy, license) clean.🤖 Generated with Claude Code