fix(rollout): apply sample filter in rollout manager by EazyReal · Pull Request #2014 · THUDM/slime

EazyReal · 2026-06-04T02:03:48Z

Summary

This PR fixes --rollout-sample-filter-path so it is applied consistently for all training rollout functions, including custom functions loaded through --rollout-function-path.

--rollout-sample-filter-path is documented as a rollout customization hook. It lets users mark generated Samples with remove_sample=True so those samples are excluded from loss calculation. Currently, the hook is applied inside the default SGLang rollout implementation. That works for the default path, but custom rollout functions can return valid training samples through RolloutManager without the configured sample filter being applied.

This PR moves the sample-filter application to RolloutManager._get_rollout_data, after rollout output normalization and _validate_rollout_id_annotated(data), and before flattening the nested rollout samples.

Motivation

This is intended as a narrow bug fix within the contribution scope described in CONTRIBUTING.md: it fixes existing behavior, adds focused CPU contract-test coverage, and avoids introducing a new feature, abstraction, or broad refactor.

The behavior this PR restores is:

If args.rollout_sample_filter_path is set, every training rollout function output that reaches RolloutManager has the same in-place Sample.remove_sample filter applied before conversion to train data.

Without this change, two rollout functions that both return valid RolloutFnTrainOutput data can behave differently depending only on whether they are the built-in SGLang provider or a custom provider.

Why the Manager Boundary

RolloutManager._get_rollout_data is the smallest shared boundary that sees every training rollout output before train-data conversion:

Every training rollout function, built-in or custom, already funnels through this method.
The manager already normalizes legacy/custom rollout outputs via call_rollout_fn.
Applying the filter after _validate_rollout_id_annotated(data) preserves the existing validation order: invalid compact rollout outputs fail before user filter code mutates them.
Applying the filter before flattening preserves the grouped shape expected by the existing sample-filter contract.
The default SGLang rollout behavior remains equivalent, while custom rollout functions now honor the same documented argument.

Keeping the hook at this shared boundary avoids requiring each rollout provider, or each custom rollout author, to repeat the same framework-level hook call and ordering.

What Stays Unchanged

rollout_all_samples_process_path intentionally stays in sglang_rollout.py. That hook needs all_samples, including groups dropped by dynamic filtering before the manager receives the kept rollout data. Moving it to the manager would require passing provider-local state into the manager, so this PR leaves that behavior untouched.

Alternatives Considered

Keep the filter only in sglang_rollout.py: simple, but custom rollout functions still do not receive the documented hook behavior.
Ask each rollout function to call the filter itself: avoids touching the manager, but duplicates sequencing across providers and makes custom rollout correctness depend on each implementation remembering the framework hook.
Apply the filter after flattening or during train-data conversion: easier to centralize, but it would lose the grouped rollout context used by the existing hook contract.
Move rollout_all_samples_process_path too: not equivalent, because the manager does not receive dropped/unused groups.

Validation

uv run --no-project --with black==24.3.0 black --check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
uv run --no-project --with ruff ruff check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
python tests/plugin_contracts/test_plugin_rollout_contracts.py in a Python 3.12 CPU test environment: 7 passed
python tests/plugin_contracts/test_plugin_path_loading_contracts.py in the same environment: 14 passed
python -m py_compile slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
uv run --no-project --with pre-commit pre-commit run --files slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
git diff --check

I did not run a GPU e2e test because the change is limited to a CPU-verifiable rollout hook contract and does not touch model execution, SGLang engine behavior, or Megatron training math.

fix(rollout): apply sample filter in rollout manager

22e7bd5

tardis-key mentioned this pull request Jun 4, 2026

📋 Daily Briefing — 2026-06-04 tardis-key/codex#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rollout): apply sample filter in rollout manager#2014

fix(rollout): apply sample filter in rollout manager#2014
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:fix/rollout-sample-filter-manager

EazyReal commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EazyReal commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Why the Manager Boundary

What Stays Unchanged

Alternatives Considered

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

EazyReal commented Jun 4, 2026 •

edited

Loading