feat: accuracy issuer inherits perf concurrency in online mode by arekay-nv · Pull Request #357 · mlcommons/endpoints

arekay-nv · 2026-06-14T22:07:43Z

What

When the performance phase runs the CONCURRENCY load pattern (online mode), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT. This makes accuracy evaluation exercise the endpoint the same way the performance run does.

Behavior

Perf load pattern	Accuracy phase (before)	Accuracy phase (after)
`concurrency` (online)	`max_throughput`	`concurrency` (same `target_concurrency`)
`poisson` (online)	`max_throughput`	`max_throughput` (unchanged)
`max_throughput` (offline)	`max_throughput`	`max_throughput` (unchanged)

POISSON and offline MAX_THROUGHPUT deliberately keep the accuracy phase at MAX_THROUGHPUT — inheriting POISSON would silently rate-limit evaluation to the perf QPS, and there's no accuracy QPS-budgeting yet. The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode (schema.py), so no separate test-type check is needed.

The accuracy issuer also now logs its chosen load mode per accuracy dataset, e.g.:

Accuracy issuer 'aime' load mode: concurrency (target_concurrency=64)

Scope

Localized to _build_phases() in commands/benchmark/execute.py (runs once at setup — no hot-path impact).
No schema/CLI/template changes (automatic behavior, no new flag).

Tests

Added to the existing TestBuildPhases suite in tests/unit/commands/test_benchmark.py:

test_accuracy_phase_inherits_perf_concurrency — perf concurrency(7) → accuracy concurrency, target_concurrency == 7
test_accuracy_phase_max_throughput_when_perf_poisson — perf poisson → accuracy stays max_throughput
test_accuracy_phase_max_throughput_when_perf_offline — offline → accuracy stays max_throughput
test_accuracy_issuer_logs_load_mode — asserts the issuer logs its mode

Verification: tests/unit/commands + tests/unit/load_generator → 266 passed; pre-commit (ruff, ruff-format, mypy, license) clean.

🤖 Generated with Claude Code

When the performance phase runs the CONCURRENCY load pattern (online), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT, so evaluation exercises the endpoint the same way as the performance run. All other patterns are unchanged: POISSON and offline MAX_THROUGHPUT perf phases keep the accuracy phase at MAX_THROUGHPUT, since inheriting POISSON would silently rate-limit evaluation to the perf QPS (no accuracy QPS-budgeting yet). The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode. Also logs the accuracy issuer's chosen load mode (pattern + target_concurrency) per accuracy dataset. Adds unit tests for the concurrency-inheritance, POISSON-stays-max-throughput, offline-stays-max-throughput, and logging cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-14T22:07:54Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request updates the benchmark execution logic so that the accuracy phase mirrors the fixed concurrency of the performance phase when running in CONCURRENCY mode, while keeping MAX_THROUGHPUT for other load patterns. It also adds corresponding unit tests and logging. Feedback was provided to remove an unnecessary LoadPattern | None type annotation on acc_load_pattern to avoid redundant type widening and let type inference deduce the non-nullable LoadPattern type.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-14T22:08:56Z

+        perf_lp = ctx.rt_settings.load_pattern
+        acc_load_pattern: LoadPattern | None
+        if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY:
+            acc_load_pattern = LoadPattern(
+                type=LoadPatternType.CONCURRENCY,
+                target_concurrency=perf_lp.target_concurrency,
+            )
+        else:
+            acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT)


The explicit type annotation acc_load_pattern: LoadPattern | None unnecessarily widens the type of acc_load_pattern to include None, even though it is guaranteed to be initialized as a LoadPattern in both branches of the if-else block. This can lead to unnecessary type-narrowing checks or static analysis warnings (e.g., from mypy or pyright) when accessing attributes like acc_load_pattern.type later in the function.\n\nWe can safely remove the explicit type annotation and let type inference deduce the correct non-nullable LoadPattern type.

perf_lp = ctx.rt_settings.load_pattern\n if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY:\n acc_load_pattern = LoadPattern(\n type=LoadPatternType.CONCURRENCY,\n target_concurrency=perf_lp.target_concurrency,\n )\n else:\n acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT)

arekay-nv requested a review from a team as a code owner June 14, 2026 22:07

github-actions Bot requested a review from nvzhihanj June 14, 2026 22:08

gemini-code-assist Bot reviewed Jun 14, 2026

View reviewed changes

nvzhihanj reviewed Jun 15, 2026

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark/execute.py

nvzhihanj approved these changes Jun 15, 2026

View reviewed changes

nvzhihanj merged commit d80b13c into release/v0.5 Jun 15, 2026
8 checks passed

github-actions Bot locked and limited conversation to collaborators Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: accuracy issuer inherits perf concurrency in online mode#357

feat: accuracy issuer inherits perf concurrency in online mode#357
nvzhihanj merged 1 commit into
release/v0.5from
arekay/accuracy-online-concurrency

arekay-nv commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arekay-nv commented Jun 14, 2026

What

Behavior

Scope

Tests

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants