perf(megatron-loss): scale logits per-chunk to avoid OOM by Yangruipis · Pull Request #2010 · THUDM/slime

Yangruipis · 2026-06-02T17:15:23Z

# ⚡ Performance

## Move rollout_temperature division into per-chunk yield in get_responses

- Remove full-tensor `logits.div(rollout_temperature)` that allocated a
  duplicate `[T, V]` fp32 buffer (~16 GiB on Qwen3 with long packed
  sequences), doubling loss-step peak memory and triggering OOM under
  allocator fragmentation
- Apply the scalar division to each `logits_chunk` right before yielding,
  so allocations are bounded by per-sample response size and happen
  incrementally instead of as a single giant contiguous block
- Numerically equivalent across all four chunking paths (cp_size==1 RL,
  SFT, allgather_cp, zigzag CP) since scalar division commutes with
  slicing and concatenation

- Remove full-tensor `logits.div(rollout_temperature)` that allocated a duplicate `[T, V]` fp32 buffer (~16 GiB on Qwen3 with long packed sequences), doubling loss-step peak memory and triggering OOM under allocator fragmentation - Apply the scalar division to each `logits_chunk` right before yielding, so allocations are bounded by per-sample response size and happen incrementally instead of as a single giant contiguous block - Numerically equivalent across all four chunking paths (cp_size==1 RL, SFT, allgather_cp, zigzag CP) since scalar division commutes with slicing and concatenation

Yangruipis closed this Jun 2, 2026

fix: tests

ac785f2

Yangruipis reopened this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(megatron-loss): scale logits per-chunk to avoid OOM#2010

perf(megatron-loss): scale logits per-chunk to avoid OOM#2010
Yangruipis wants to merge 2 commits into
THUDM:mainfrom
redai-infra:fix/wuhuan/div_oom

Yangruipis commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yangruipis commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant