Skip to content

Support OPD when teacher tokenization differs#2032

Open
hhnqqq wants to merge 1 commit into
THUDM:mainfrom
hhnqqq:add-cross-vocab-opd
Open

Support OPD when teacher tokenization differs#2032
hhnqqq wants to merge 1 commit into
THUDM:mainfrom
hhnqqq:add-cross-vocab-opd

Conversation

@hhnqqq

@hhnqqq hhnqqq commented Jun 8, 2026

Copy link
Copy Markdown

SGLang OPD previously assumed that teacher and student shared token IDs, so it could send student token IDs directly to the teacher server and trim returned logprobs by student response length. Cross-vocabulary distillation needs the teacher to render the original chat messages with its own chat template, then align teacher response logprobs back to student response positions only where token boundaries match exactly.

This adds explicit cross-vocabulary hooks, preserves raw prompt messages in dataset metadata when requested, and documents the SGLang configuration. Non-aligned response positions keep the student rollout logprob so the OPD delta is zero there.

Constraint: Official Sample.prompt may already be a student-rendered string after --apply-chat-template, so raw messages are preserved through an opt-in metadata key instead of relying on local origin_prompt fields.
Rejected: Send student token IDs to cross-vocabulary teachers | token IDs are tokenizer-local and cannot represent the same prompt across vocabularies.
Rejected: Apply teacher logprobs to every decoded-text span | many-to-one token spans do not map to a single student logprob position cleanly.
Confidence: medium
Scope-risk: moderate
Directive: Do not replace the metadata messages path with origin_prompt-style private fields; official datasets need an explicit portable way to carry raw messages.
Tested: python -m py_compile on modified Python files; manual execution of tests/test_on_policy_distillation.py test functions; git diff --check.
Not-tested: Full pytest suite unavailable because pytest is not installed; parser smoke blocked by missing sglang_router dependency; live SGLang teacher request not exercised.

@hhnqqq hhnqqq force-pushed the add-cross-vocab-opd branch from 2b01c5b to 1fb0c44 Compare June 8, 2026 11:30
SGLang OPD previously assumed that teacher and student shared token IDs, so it could send student token IDs directly to the teacher server and trim returned logprobs by student response length. Cross-vocabulary distillation needs the teacher to render the original chat messages with its own chat template, then align teacher response logprobs back to student response positions only where token boundaries match exactly.

This adds explicit cross-vocabulary hooks, preserves raw prompt messages in dataset metadata when requested, documents the SGLang configuration, and includes a conservative 8xH200 example using a Qwen3-8B student with a Qwen3.5-35B-A3B teacher. Non-aligned response positions keep the student rollout logprob so the OPD delta is zero there. The test file also includes an opt-in live SGLang teacher API check guarded by environment variables.

Constraint: Official Sample.prompt may already be a student-rendered string after --apply-chat-template, so raw messages are preserved through an opt-in metadata key instead of relying on local origin_prompt fields.
Constraint: The example reserves GPU 6,7 for the teacher and exposes only GPU 0-5 to slime/Ray so colocated rollout and Megatron engines do not contend with the teacher server.
Rejected: Send student token IDs to cross-vocabulary teachers | token IDs are tokenizer-local and cannot represent the same prompt across vocabularies.
Rejected: Apply teacher logprobs to every decoded-text span | many-to-one token spans do not map to a single student logprob position cleanly.
Confidence: medium
Scope-risk: moderate
Directive: Do not replace the metadata messages path with origin_prompt-style private fields; official datasets need an explicit portable way to carry raw messages.
Tested: bash -n examples/on_policy_distillation/run-qwen3-8B-qwen3.5-35B-A3B-cross-vocab-opd.sh; python -m py_compile on modified Python files; manual production-function checks for render/dataset/alignment/postprocess; git diff --check.
Not-tested: Full pytest suite unavailable because pytest is not installed locally; parser smoke blocked by missing sglang_router dependency; live SGLang teacher API test and example script not run locally because no teacher endpoint was provided.
@hhnqqq hhnqqq force-pushed the add-cross-vocab-opd branch from 1fb0c44 to 76fde4f Compare June 8, 2026 11:45
@hhnqqq

hhnqqq commented Jun 8, 2026

Copy link
Copy Markdown
Author
image

An example of distillating a Qwen3-30B-A3B-based model to Qwen3.5-35B-A3B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant