Skip to content

feat: per-dataset max_new_tokens override#356

Open
roborluo wants to merge 2 commits into
mlcommons:release/v0.5from
roborluo:dev-bofengl-per-dataset-max-new-tokens
Open

feat: per-dataset max_new_tokens override#356
roborluo wants to merge 2 commits into
mlcommons:release/v0.5from
roborluo:dev-bofengl-per-dataset-max-new-tokens

Conversation

@roborluo

Copy link
Copy Markdown

What does this PR do?

When running a combined performance + accuracy benchmark in a single --mode both invocation, the two phases want opposite generation caps, but today the harness only exposes one global model_params.max_new_tokens:

  • Performance phase needs a small cap. max_new_tokens is sent to the server as the per-request max_tokens, and a disaggregated decode scheduler reserves/plans decode-KV for that declared upper bound — even though generation actually stops at EOS far sooner. A large cap (e.g. 32768) over-reserves decode KV (~3.2× vs 10240), starves admittable decode slots at high concurrency, and triggers KV-transfer-timeout storms on the context→gen path. A realistic small cap avoids this.
  • Accuracy phase needs a large cap, otherwise long reasoning outputs get truncated and scores are artificially deflated. This matches the MLPerf Inference gpt-oss-120b reference, where the performance and accuracy workloads use different token settings — see language/gpt-oss-120b → Model and Dataset download. Without a per-dataset override, you cannot satisfy both in one --mode both run.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

N/A

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

roborluo and others added 2 commits June 12, 2026 16:55
Add optional `max_new_tokens` to the Dataset config so performance and
accuracy datasets can use different per-request max_tokens within a single
`--mode both` run.

The client sends model_params.max_new_tokens as the OpenAI completions
`max_tokens`. A large global value (e.g. 32768) inflates the server-side
per-request decode KV reservation; at high concurrency this starves the
disaggregated ctx->gen KV-cache transfer and triggers KV-cache-transfer
timeout storms. This change lets a perf dataset use a small cap (avoid the
overload) while accuracy datasets keep a large cap (avoid truncating long
reasoning output). Falls back to model_params.max_new_tokens when unset;
applied per-dataset in execute.py via model_params.model_copy(update=...).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cover Dataset.max_new_tokens: defaults to None, accepts a per-dataset
override, and rejects non-positive values. All test_schema.py tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@roborluo roborluo requested a review from a team as a code owner June 13, 2026 05:10
@github-actions

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a per-dataset max_new_tokens override capability to allow performance and accuracy datasets to use different token limits, falling back to the global model_params when unset. The feedback suggests encapsulating the override logic into a helper method get_model_params on the Dataset class to eliminate code duplication across the accuracy and performance dataset loading paths, and adding corresponding unit tests for this helper.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +304 to +314
max_new_tokens: int | None = Field(
None,
gt=0,
description=(
"Per-dataset override of model_params.max_new_tokens (sent as the "
"per-request max_tokens). Lets a performance dataset use a small cap "
"(to avoid server-side KV over-reservation/overload at high concurrency) "
"while accuracy datasets use a larger cap (to avoid truncating long "
"reasoning output). Falls back to model_params.max_new_tokens when unset."
),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid duplicating the max_new_tokens override logic across different dataset loading paths, we can encapsulate this behavior as a helper method on the Dataset class itself. This improves maintainability and makes the code more robust to future changes.

Suggested change
max_new_tokens: int | None = Field(
None,
gt=0,
description=(
"Per-dataset override of model_params.max_new_tokens (sent as the "
"per-request max_tokens). Lets a performance dataset use a small cap "
"(to avoid server-side KV over-reservation/overload at high concurrency) "
"while accuracy datasets use a larger cap (to avoid truncating long "
"reasoning output). Falls back to model_params.max_new_tokens when unset."
),
)
max_new_tokens: int | None = Field(
None,
gt=0,
description=(
"Per-dataset override of model_params.max_new_tokens (sent as the "
"per-request max_tokens). Lets a performance dataset use a small cap "
"(to avoid server-side KV over-reservation/overload at high concurrency) "
"while accuracy datasets use a larger cap (to avoid truncating long "
"reasoning output). Falls back to model_params.max_new_tokens when unset."
),
)
def get_model_params(self, global_params: ModelParams) -> ModelParams:
"""Get model params with per-dataset max_new_tokens override applied if set."""
if self.max_new_tokens is None:
return global_params
return global_params.model_copy(update={"max_new_tokens": self.max_new_tokens})

Comment on lines +287 to +294
# Per-dataset max_new_tokens override (falls back to global model_params).
acc_model_params = (
config.model_params
if acc_cfg.max_new_tokens is None
else config.model_params.model_copy(
update={"max_new_tokens": acc_cfg.max_new_tokens}
)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.

Suggested change
# Per-dataset max_new_tokens override (falls back to global model_params).
acc_model_params = (
config.model_params
if acc_cfg.max_new_tokens is None
else config.model_params.model_copy(
update={"max_new_tokens": acc_cfg.max_new_tokens}
)
)
acc_model_params = acc_cfg.get_model_params(config.model_params)

Comment on lines +307 to +314
# Per-dataset max_new_tokens override (falls back to global model_params).
perf_model_params = (
config.model_params
if perf_cfg.max_new_tokens is None
else config.model_params.model_copy(
update={"max_new_tokens": perf_cfg.max_new_tokens}
)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the new get_model_params helper method on the Dataset configuration model to simplify the override logic and eliminate duplication.

Suggested change
# Per-dataset max_new_tokens override (falls back to global model_params).
perf_model_params = (
config.model_params
if perf_cfg.max_new_tokens is None
else config.model_params.model_copy(
update={"max_new_tokens": perf_cfg.max_new_tokens}
)
)
perf_model_params = perf_cfg.get_model_params(config.model_params)

Comment on lines +126 to +150
@pytest.mark.unit
def test_max_new_tokens_defaults_none(self):
ds = Dataset(name="perf", type=DatasetType.PERFORMANCE, path="data.jsonl")
assert ds.max_new_tokens is None

@pytest.mark.unit
def test_per_dataset_max_new_tokens_override(self):
ds = Dataset(
name="aime25",
type=DatasetType.ACCURACY,
path="aime25.jsonl",
eval_method=EvalMethod.EXACT_MATCH,
max_new_tokens=32768,
)
assert ds.max_new_tokens == 32768

@pytest.mark.unit
def test_max_new_tokens_rejects_non_positive(self):
with pytest.raises(ValueError, match="greater than 0"):
Dataset(
name="perf",
type=DatasetType.PERFORMANCE,
path="data.jsonl",
max_new_tokens=0,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add unit tests to verify the correctness of the new get_model_params helper method on the Dataset configuration model.

    @pytest.mark.unit
    def test_max_new_tokens_defaults_none(self):
        ds = Dataset(name="perf", type=DatasetType.PERFORMANCE, path="data.jsonl")
        assert ds.max_new_tokens is None

    @pytest.mark.unit
    def test_per_dataset_max_new_tokens_override(self):
        ds = Dataset(
            name="aime25",
            type=DatasetType.ACCURACY,
            path="aime25.jsonl",
            eval_method=EvalMethod.EXACT_MATCH,
            max_new_tokens=32768,
        )
        assert ds.max_new_tokens == 32768

    @pytest.mark.unit
    def test_max_new_tokens_rejects_non_positive(self):
        with pytest.raises(ValueError, match="greater than 0"):
            Dataset(
                name="perf",
                type=DatasetType.PERFORMANCE,
                path="data.jsonl",
                max_new_tokens=0,
            )

    @pytest.mark.unit
    def test_get_model_params_override(self):
        global_params = ModelParams(name="test", max_new_tokens=1024)
        ds_no_override = Dataset(name="perf", type=DatasetType.PERFORMANCE, path="data.jsonl")
        assert ds_no_override.get_model_params(global_params).max_new_tokens == 1024

        ds_with_override = Dataset(
            name="aime25",
            type=DatasetType.ACCURACY,
            path="aime25.jsonl",
            max_new_tokens=32768,
        )
        assert ds_with_override.get_model_params(global_params).max_new_tokens == 32768

@arekay-nv

Copy link
Copy Markdown
Collaborator

@roborluo Can you look at #344 which addresses the same issue. We can consolidate the two here and merge this one. I think the other one has also the templates correctly populated which you are failing in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants