Skip to content

fix(rollout): apply sample filter in rollout manager#2014

Open
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:fix/rollout-sample-filter-manager
Open

fix(rollout): apply sample filter in rollout manager#2014
EazyReal wants to merge 1 commit into
THUDM:mainfrom
EazyReal:fix/rollout-sample-filter-manager

Conversation

@EazyReal

@EazyReal EazyReal commented Jun 4, 2026

Copy link
Copy Markdown

Summary

This PR fixes --rollout-sample-filter-path so it is applied consistently for all training rollout functions, including custom functions loaded through --rollout-function-path.

--rollout-sample-filter-path is documented as a rollout customization hook. It lets users mark generated Samples with remove_sample=True so those samples are excluded from loss calculation. Currently, the hook is applied inside the default SGLang rollout implementation. That works for the default path, but custom rollout functions can return valid training samples through RolloutManager without the configured sample filter being applied.

This PR moves the sample-filter application to RolloutManager._get_rollout_data, after rollout output normalization and _validate_rollout_id_annotated(data), and before flattening the nested rollout samples.

Motivation

This is intended as a narrow bug fix within the contribution scope described in CONTRIBUTING.md: it fixes existing behavior, adds focused CPU contract-test coverage, and avoids introducing a new feature, abstraction, or broad refactor.

The behavior this PR restores is:

If args.rollout_sample_filter_path is set, every training rollout function output that reaches RolloutManager has the same in-place Sample.remove_sample filter applied before conversion to train data.

Without this change, two rollout functions that both return valid RolloutFnTrainOutput data can behave differently depending only on whether they are the built-in SGLang provider or a custom provider.

Why the Manager Boundary

RolloutManager._get_rollout_data is the smallest shared boundary that sees every training rollout output before train-data conversion:

  • Every training rollout function, built-in or custom, already funnels through this method.
  • The manager already normalizes legacy/custom rollout outputs via call_rollout_fn.
  • Applying the filter after _validate_rollout_id_annotated(data) preserves the existing validation order: invalid compact rollout outputs fail before user filter code mutates them.
  • Applying the filter before flattening preserves the grouped shape expected by the existing sample-filter contract.
  • The default SGLang rollout behavior remains equivalent, while custom rollout functions now honor the same documented argument.

Keeping the hook at this shared boundary avoids requiring each rollout provider, or each custom rollout author, to repeat the same framework-level hook call and ordering.

What Stays Unchanged

rollout_all_samples_process_path intentionally stays in sglang_rollout.py. That hook needs all_samples, including groups dropped by dynamic filtering before the manager receives the kept rollout data. Moving it to the manager would require passing provider-local state into the manager, so this PR leaves that behavior untouched.

Alternatives Considered

  • Keep the filter only in sglang_rollout.py: simple, but custom rollout functions still do not receive the documented hook behavior.
  • Ask each rollout function to call the filter itself: avoids touching the manager, but duplicates sequencing across providers and makes custom rollout correctness depend on each implementation remembering the framework hook.
  • Apply the filter after flattening or during train-data conversion: easier to centralize, but it would lose the grouped rollout context used by the existing hook contract.
  • Move rollout_all_samples_process_path too: not equivalent, because the manager does not receive dropped/unused groups.

Validation

  • uv run --no-project --with black==24.3.0 black --check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
  • uv run --no-project --with ruff ruff check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
  • python tests/plugin_contracts/test_plugin_rollout_contracts.py in a Python 3.12 CPU test environment: 7 passed
  • python tests/plugin_contracts/test_plugin_path_loading_contracts.py in the same environment: 14 passed
  • python -m py_compile slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
  • uv run --no-project --with pre-commit pre-commit run --files slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.py
  • git diff --check

I did not run a GPU e2e test because the change is limited to a CPU-verifiable rollout hook contract and does not touch model execution, SGLang engine behavior, or Megatron training math.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant