fix(rollout): apply sample filter in rollout manager#2014
Open
EazyReal wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes
--rollout-sample-filter-pathso it is applied consistently for all training rollout functions, including custom functions loaded through--rollout-function-path.--rollout-sample-filter-pathis documented as a rollout customization hook. It lets users mark generatedSamples withremove_sample=Trueso those samples are excluded from loss calculation. Currently, the hook is applied inside the default SGLang rollout implementation. That works for the default path, but custom rollout functions can return valid training samples throughRolloutManagerwithout the configured sample filter being applied.This PR moves the sample-filter application to
RolloutManager._get_rollout_data, after rollout output normalization and_validate_rollout_id_annotated(data), and before flattening the nested rollout samples.Motivation
This is intended as a narrow bug fix within the contribution scope described in
CONTRIBUTING.md: it fixes existing behavior, adds focused CPU contract-test coverage, and avoids introducing a new feature, abstraction, or broad refactor.The behavior this PR restores is:
Without this change, two rollout functions that both return valid
RolloutFnTrainOutputdata can behave differently depending only on whether they are the built-in SGLang provider or a custom provider.Why the Manager Boundary
RolloutManager._get_rollout_datais the smallest shared boundary that sees every training rollout output before train-data conversion:call_rollout_fn._validate_rollout_id_annotated(data)preserves the existing validation order: invalid compact rollout outputs fail before user filter code mutates them.Keeping the hook at this shared boundary avoids requiring each rollout provider, or each custom rollout author, to repeat the same framework-level hook call and ordering.
What Stays Unchanged
rollout_all_samples_process_pathintentionally stays insglang_rollout.py. That hook needsall_samples, including groups dropped by dynamic filtering before the manager receives the kept rollout data. Moving it to the manager would require passing provider-local state into the manager, so this PR leaves that behavior untouched.Alternatives Considered
sglang_rollout.py: simple, but custom rollout functions still do not receive the documented hook behavior.rollout_all_samples_process_pathtoo: not equivalent, because the manager does not receive dropped/unused groups.Validation
uv run --no-project --with black==24.3.0 black --check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.pyuv run --no-project --with ruff ruff check slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.pypython tests/plugin_contracts/test_plugin_rollout_contracts.pyin a Python 3.12 CPU test environment: 7 passedpython tests/plugin_contracts/test_plugin_path_loading_contracts.pyin the same environment: 14 passedpython -m py_compile slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.pyuv run --no-project --with pre-commit pre-commit run --files slime/rollout/base_types.py slime/ray/rollout.py slime/rollout/sglang_rollout.py tests/plugin_contracts/test_plugin_rollout_contracts.pygit diff --checkI did not run a GPU e2e test because the change is limited to a CPU-verifiable rollout hook contract and does not touch model execution, SGLang engine behavior, or Megatron training math.