Skip to content

Refine judge#1906

Merged
hhaAndroid merged 8 commits into
InternLM:mainfrom
hhaAndroid:refine_judge
Jun 12, 2026
Merged

Refine judge#1906
hhaAndroid merged 8 commits into
InternLM:mainfrom
hhaAndroid:refine_judge

Conversation

@hhaAndroid

@hhaAndroid hhaAndroid commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

标题

修正 ComposedJudger 分支契约并补充 Judger 文档说明

描述

本 PR 明确区分了 BaseJudger 和 Judger 两层接口,并约束 ComposedJudger 只支持继承 Judger 的 branch。

背景是 ComposedJudger 内部依赖 preprocess -> judge_payload -> postprocess 的 payload 契约来组合多个分支。该设计可以避免对完整 RolloutState 做 deepcopy,减少大字段序列化开销,也降低复制复杂对象带来的潜在风险。对于只继承
BaseJudger、直接自定义 judge() 流程的实现,ComposedJudger 暂不支持作为 branch 使用,否则在不 deepcopy 的情况下可能出现多个 branch 并发修改同一个 RolloutState 的风险。

主要改动

  • 新增 BaseJudger,作为最小 Judger 接口,仅定义 judge() / batch_judge() / get_judger_name()。
  • 让 Judger 继承 BaseJudger,并明确它代表可组合的 payload 契约。
  • 新增 judger skill
  • ComposedJudger 构造时校验 branch:
    • 继承 Judger 的 branch 允许使用。
    • 只继承 BaseJudger 的 branch 直接报错,并提示当前不支持。
  • 导出 BaseJudger。
  • 补充单测,覆盖 ComposedJudger 拒绝 BaseJudger-only branch 的场景。
  • 更新中文 Judger 文档,说明:
    • 推荐自定义 Judger 时继承 Judger 并重载 preprocess / judge_payload / postprocess。
    • 这样做可以避免 deepcopy(RolloutState) 的潜在风险和额外序列化开销。
    • BaseJudger 适合完全自定义 judge 流程,但不能作为 ComposedJudger branch。

@hhaAndroid

Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread xtuner/v1/rl/judger/composed.py
Comment thread xtuner/v1/rl/judger/composed.py
Comment thread xtuner/v1/rl/judger/native.py
@claude

claude Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Claude: ## Summary

Introduces BaseJudger as a minimal judger interface (just judge()/batch_judge()/get_judger_name()), separating it from Judger which adds the composable payload contract (preprocess → judge_payload → postprocess). ComposedJudger now validates at construction time that branches inherit the full Judger contract, preventing subtle concurrent-mutation bugs. Also adds two example RL training configs and updates documentation.

ProduceBatchResult impact: not affected
RoutedExperts impact: not affected

Issues

Warning

  • [examples/v1/config/reasoning_rl_qwen3p5vl_mtp_ep.py:12,27-32] AgentLoopManagerConfig, TaskSpecConfig, and SamplerConfig are imported twice — once on line 12 and again on lines 27-32. The second import block only needs AsyncProduceStrategyConfig.

Nit

  • [xtuner/v1/rl/judger/composed.py:38] _validate_branches parameter type is dict[str, Judger] but the method exists to catch instances that are only BaseJudger. The type annotation should be dict[str, BaseJudger] to accurately reflect the runtime contract. (See inline comment for details.)
  • [xtuner/v1/rl/judger/composed.py:31] cast(dict[str, Judger], branches) is a no-op when the parameter is already typed dict[str, Judger]. (See inline comment.)
  • [examples/v1/config/reasoning_rl_qwen3p5vl_mtp_ep.py:29, examples/v1/config/agentic_rl_qwen3p5vl_mtp_ep_code.py:22] Shell-style comments (# export LMDEPLOY_FP32_MAMBA_SSM_DTYPE=1) in Python config files are unusual — consider moving these to a README or a shell wrapper script.

Verdict

APPROVE — the core judger hierarchy change is clean and well-motivated. The BaseJudger / Judger split is a sensible guardrail against concurrent RolloutState mutation in composed judging. Test coverage for the rejection path is included. The type annotation nits are minor and non-blocking.

@hhaAndroid hhaAndroid merged commit ece81ef into InternLM:main Jun 12, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants