You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces BaseJudger as a minimal judger interface (just judge()/batch_judge()/get_judger_name()), separating it from Judger which adds the composable payload contract (preprocess → judge_payload → postprocess). ComposedJudger now validates at construction time that branches inherit the full Judger contract, preventing subtle concurrent-mutation bugs. Also adds two example RL training configs and updates documentation.
ProduceBatchResult impact: not affected
RoutedExperts impact: not affected
Issues
Warning
[examples/v1/config/reasoning_rl_qwen3p5vl_mtp_ep.py:12,27-32] AgentLoopManagerConfig, TaskSpecConfig, and SamplerConfig are imported twice — once on line 12 and again on lines 27-32. The second import block only needs AsyncProduceStrategyConfig.
Nit
[xtuner/v1/rl/judger/composed.py:38] _validate_branches parameter type is dict[str, Judger] but the method exists to catch instances that are onlyBaseJudger. The type annotation should be dict[str, BaseJudger] to accurately reflect the runtime contract. (See inline comment for details.)
[xtuner/v1/rl/judger/composed.py:31] cast(dict[str, Judger], branches) is a no-op when the parameter is already typed dict[str, Judger]. (See inline comment.)
[examples/v1/config/reasoning_rl_qwen3p5vl_mtp_ep.py:29, examples/v1/config/agentic_rl_qwen3p5vl_mtp_ep_code.py:22] Shell-style comments (# export LMDEPLOY_FP32_MAMBA_SSM_DTYPE=1) in Python config files are unusual — consider moving these to a README or a shell wrapper script.
Verdict
APPROVE — the core judger hierarchy change is clean and well-motivated. The BaseJudger / Judger split is a sensible guardrail against concurrent RolloutState mutation in composed judging. Test coverage for the rejection path is included. The type annotation nits are minor and non-blocking.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
标题
修正 ComposedJudger 分支契约并补充 Judger 文档说明
描述
本 PR 明确区分了 BaseJudger 和 Judger 两层接口,并约束 ComposedJudger 只支持继承 Judger 的 branch。
背景是 ComposedJudger 内部依赖 preprocess -> judge_payload -> postprocess 的 payload 契约来组合多个分支。该设计可以避免对完整 RolloutState 做 deepcopy,减少大字段序列化开销,也降低复制复杂对象带来的潜在风险。对于只继承
BaseJudger、直接自定义 judge() 流程的实现,ComposedJudger 暂不支持作为 branch 使用,否则在不 deepcopy 的情况下可能出现多个 branch 并发修改同一个 RolloutState 的风险。
主要改动