Use put_along_axis for Paddle routing metadata by SigureMo · Pull Request #16 · PFCCLab/DeepEP

SigureMo · 2026-05-19T06:46:49Z

背景

Paddle 版 HybridEP 的 indices_to_map() 之前通过 scatter_nd_add 构造 dense routing map/probs。这是为了绕过 Paddle compat 下 torch.scatter 的兼容问题，但实现比上游 scatter 语义更绕，也引入了额外的索引展开和临时 tensor。

修改

将 indices_to_map() 中的 scatter_nd_add 路径改为 paddle.put_along_axis。
复用 topk_idx.to(torch.int64)，避免重复转换。
保留 uint8 -> bool 的 routing map 写法，因为 Paddle 当前没有 CUDA bool put_along_axis kernel，不能直接恢复成上游 dtype=torch.bool scatter 写法。
将 tensor 创建方式收束为 device="cuda"，更接近上游写法。

验证

逐位对齐：

两机 2x8 复跑 A1B topk=2 逐位对齐。
final_layernorm_output MD5：rank 0/8 均 ordered_unique_neq = 0/100。
tr_loss_before_reduce：rank 0/8 均 paired_neq = 0/50。

性能验证：

同配置 A/B 对比 put_along_axis 与旧 scatter_nd_add 实现，统计 step 51-100。
global_steps_per_second：0.359827 vs 0.349351，约 +2.91%。
tokens_per_sec_per_card：5895.404 vs 5723.763，约 +2.91%。
dispatch/combine 时间整体接近，端到端吞吐有小幅提升。

^{This PR is authored by @codex (gpt-5.5 xhigh)}

Replace the temporary scatter_nd_add construction in Paddle HybridEP indices_to_map with put_along_axis, matching the upstream scatter semantics while keeping uint8 routing storage because Paddle does not provide a CUDA bool put_along_axis kernel. Validation:\n- 2x8 A1B topk=2 DeepEP vs HybridEP 50-step bitwise check: final_layernorm_output MD5 matched 100/100 for ranks 0 and 8; tr_loss_before_reduce matched 50/50 for ranks 0 and 8. Co-authored-by: Codex <noreply@openai.com>

Copilot

Pull request overview

This PR updates the Paddle HybridEP indices_to_map() helper to build dense routing metadata using paddle.put_along_axis instead of the previous scatter_nd_add-based construction, aiming to reduce index expansion and intermediate tensor overhead under Paddle compat.

Changes:

Replaced the scatter_nd_add-based dense routing map/prob construction with paddle.put_along_axis.
Reused a single topk_idx int64 conversion to avoid repeated casts.
Kept routing map materialization via uint8 -> bool to work around missing CUDA bool put_along_axis support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings May 19, 2026 06:46

Copilot started reviewing on behalf of SigureMo May 19, 2026 06:46 View session

SigureMo marked this pull request as draft May 19, 2026 06:47

Copilot AI reviewed May 19, 2026

View reviewed changes

SigureMo marked this pull request as ready for review May 19, 2026 07:12

SigureMo merged commit 834a754 into hybrid-ep-paddle May 19, 2026
1 check passed

SigureMo mentioned this pull request May 19, 2026

[OPs][HybridEP] Update HybridEP submodule PaddlePaddle/PaddleFleet#986

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use put_along_axis for Paddle routing metadata#16

Use put_along_axis for Paddle routing metadata#16
SigureMo merged 1 commit into
hybrid-ep-paddlefrom
sigure/hybridep-put-along-axis

SigureMo commented May 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SigureMo commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

背景

修改

验证

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SigureMo commented May 19, 2026 •

edited

Loading