Skip to content

feat: accuracy issuer inherits perf concurrency in online mode#357

Merged
nvzhihanj merged 1 commit into
release/v0.5from
arekay/accuracy-online-concurrency
Jun 15, 2026
Merged

feat: accuracy issuer inherits perf concurrency in online mode#357
nvzhihanj merged 1 commit into
release/v0.5from
arekay/accuracy-online-concurrency

Conversation

@arekay-nv

Copy link
Copy Markdown
Collaborator

What

When the performance phase runs the CONCURRENCY load pattern (online mode), the accuracy phase now mirrors that same fixed concurrency instead of always bursting at MAX_THROUGHPUT. This makes accuracy evaluation exercise the endpoint the same way the performance run does.

Behavior

Perf load pattern Accuracy phase (before) Accuracy phase (after)
concurrency (online) max_throughput concurrency (same target_concurrency)
poisson (online) max_throughput max_throughput (unchanged)
max_throughput (offline) max_throughput max_throughput (unchanged)

POISSON and offline MAX_THROUGHPUT deliberately keep the accuracy phase at MAX_THROUGHPUT — inheriting POISSON would silently rate-limit evaluation to the perf QPS, and there's no accuracy QPS-budgeting yet. The gate is purely load_pattern.type == CONCURRENCY, which the schema already constrains to online mode (schema.py), so no separate test-type check is needed.

The accuracy issuer also now logs its chosen load mode per accuracy dataset, e.g.:

Accuracy issuer 'aime' load mode: concurrency (target_concurrency=64)

Scope

  • Localized to _build_phases() in commands/benchmark/execute.py (runs once at setup — no hot-path impact).
  • No schema/CLI/template changes (automatic behavior, no new flag).

Tests

Added to the existing TestBuildPhases suite in tests/unit/commands/test_benchmark.py:

  • test_accuracy_phase_inherits_perf_concurrency — perf concurrency(7) → accuracy concurrency, target_concurrency == 7
  • test_accuracy_phase_max_throughput_when_perf_poisson — perf poisson → accuracy stays max_throughput
  • test_accuracy_phase_max_throughput_when_perf_offline — offline → accuracy stays max_throughput
  • test_accuracy_issuer_logs_load_mode — asserts the issuer logs its mode

Verification: tests/unit/commands + tests/unit/load_generator266 passed; pre-commit (ruff, ruff-format, mypy, license) clean.

🤖 Generated with Claude Code

When the performance phase runs the CONCURRENCY load pattern (online), the
accuracy phase now mirrors that same fixed concurrency instead of always
bursting at MAX_THROUGHPUT, so evaluation exercises the endpoint the same way
as the performance run.

All other patterns are unchanged: POISSON and offline MAX_THROUGHPUT perf
phases keep the accuracy phase at MAX_THROUGHPUT, since inheriting POISSON
would silently rate-limit evaluation to the perf QPS (no accuracy QPS-budgeting
yet). The gate is purely load_pattern.type == CONCURRENCY, which the schema
already constrains to online mode.

Also logs the accuracy issuer's chosen load mode (pattern + target_concurrency)
per accuracy dataset. Adds unit tests for the concurrency-inheritance,
POISSON-stays-max-throughput, offline-stays-max-throughput, and logging cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@arekay-nv arekay-nv requested a review from a team as a code owner June 14, 2026 22:07
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions Bot requested a review from nvzhihanj June 14, 2026 22:08

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the benchmark execution logic so that the accuracy phase mirrors the fixed concurrency of the performance phase when running in CONCURRENCY mode, while keeping MAX_THROUGHPUT for other load patterns. It also adds corresponding unit tests and logging. Feedback was provided to remove an unnecessary LoadPattern | None type annotation on acc_load_pattern to avoid redundant type widening and let type inference deduce the non-nullable LoadPattern type.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +473 to +481
perf_lp = ctx.rt_settings.load_pattern
acc_load_pattern: LoadPattern | None
if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY:
acc_load_pattern = LoadPattern(
type=LoadPatternType.CONCURRENCY,
target_concurrency=perf_lp.target_concurrency,
)
else:
acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The explicit type annotation acc_load_pattern: LoadPattern | None unnecessarily widens the type of acc_load_pattern to include None, even though it is guaranteed to be initialized as a LoadPattern in both branches of the if-else block. This can lead to unnecessary type-narrowing checks or static analysis warnings (e.g., from mypy or pyright) when accessing attributes like acc_load_pattern.type later in the function.\n\nWe can safely remove the explicit type annotation and let type inference deduce the correct non-nullable LoadPattern type.

        perf_lp = ctx.rt_settings.load_pattern\n        if perf_lp is not None and perf_lp.type == LoadPatternType.CONCURRENCY:\n            acc_load_pattern = LoadPattern(\n                type=LoadPatternType.CONCURRENCY,\n                target_concurrency=perf_lp.target_concurrency,\n            )\n        else:\n            acc_load_pattern = LoadPattern(type=LoadPatternType.MAX_THROUGHPUT)

Comment thread src/inference_endpoint/commands/benchmark/execute.py
@nvzhihanj nvzhihanj merged commit d80b13c into release/v0.5 Jun 15, 2026
8 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants