[draft] Add support for multi teacher OPD and memory-efficient topk level OPD by hhnqqq · Pull Request #2033 · THUDM/slime

hhnqqq · 2026-06-08T12:44:27Z

Top-k OPD needs a separate actor flow because it prepares old-policy top-k indices, teacher log-probs on those indices, and a top-k/tail loss before normal Megatron training. The normal train.py path now stays on the existing actor unless a dedicated entry registers the top-k actor subclass.

Constraint: Teachers are assumed homogeneous with the student so top-k token ids are shared.

Constraint: The top-k CP implementation follows the existing zigzag CP layout and rejects --allgather-cp.

Rejected: Trigger top-k actor selection implicitly from generic train.py | user requested an explicit topkopd_train.py entry for the actor subclass.

Confidence: medium

Scope-risk: moderate

Directive: Keep top-k actor selection explicit through topkopd_train.py unless generic train.py gains a documented actor plugin API.

Tested: python3 -m py_compile on modified Python files

Tested: bash -n examples/on_policy_distillation/run-qwen3-8B-topk-opd-megatron.sh

Tested: git diff --check

Not-tested: Distributed Megatron runtime with TP/CP GPUs and real teacher checkpoints

Not-tested: --allgather-cp top-k OPD, intentionally rejected by argument validation

Top-k OPD needs a separate actor flow because it prepares old-policy top-k indices, teacher log-probs on those indices, and a top-k/tail loss before normal Megatron training. The normal train.py path now stays on the existing actor unless a dedicated entry registers the top-k actor subclass. Constraint: Teachers are assumed homogeneous with the student so top-k token ids are shared. Constraint: The top-k CP implementation follows the existing zigzag CP layout and rejects --allgather-cp. Rejected: Trigger top-k actor selection implicitly from generic train.py | user requested an explicit topkopd_train.py entry for the actor subclass. Confidence: medium Scope-risk: moderate Directive: Keep top-k actor selection explicit through topkopd_train.py unless generic train.py gains a documented actor plugin API. Tested: python3 -m py_compile on modified Python files Tested: bash -n examples/on_policy_distillation/run-qwen3-8B-topk-opd-megatron.sh Tested: git diff --check Not-tested: Distributed Megatron runtime with TP/CP GPUs and real teacher checkpoints Not-tested: --allgather-cp top-k OPD, intentionally rejected by argument validation

Top-k OPD computes its training signal from Megatron teacher top-k and tail distributions, so the example should not require a task-specific reward model just to exercise the distillation path. Add minimal zero-reward and placeholder-advantage helpers and wire the top-k example to them. Constraint: Keep the helper independent from self-OPD-specific reward, EMA, PRM, and mixed RL logic. Confidence: high Scope-risk: narrow Tested: PYTHONPYCACHEPREFIX=/tmp/slime-pycache python3 -m py_compile examples/on_policy_distillation/topk_opd_helpers.py examples/on_policy_distillation/topkopd_train.py slime/backends/megatron_utils/topk_opd_actor.py slime/backends/megatron_utils/loss.py slime/utils/arguments.py Tested: bash -n examples/on_policy_distillation/run-qwen3-8B-topk-opd-megatron.sh Tested: git diff --check Not-tested: Full distributed top-k OPD runtime with real teacher checkpoints

hhnqqq force-pushed the topk-level-opd branch from 843ce0b to 0bf89cf Compare June 8, 2026 13:06

hhnqqq force-pushed the topk-level-opd branch from 396e908 to afcbb2b Compare June 8, 2026 13:21

hhnqqq changed the title ~~[draft] Add support for multi teacher opd and topk level opd~~ [draft] Add support for multi teacher OPD and memory-efficient topk level OPD Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Add support for multi teacher OPD and memory-efficient topk level OPD#2033

[draft] Add support for multi teacher OPD and memory-efficient topk level OPD#2033
hhnqqq wants to merge 2 commits into
THUDM:mainfrom
hhnqqq:topk-level-opd

hhnqqq commented Jun 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhnqqq commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hhnqqq commented Jun 8, 2026 •

edited

Loading