Skip to content

[paddle-adapt] comm/test_dcp_alltoall: 29 PASS with assert_close Paddle compat patch (#25)#26

Merged
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:adapt/comm
May 19, 2026
Merged

[paddle-adapt] comm/test_dcp_alltoall: 29 PASS with assert_close Paddle compat patch (#25)#26
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:adapt/comm

Conversation

@BingooYang
Copy link
Copy Markdown

Description

Adapt for Paddle compat mode. Only (single-GPU DCP LL128 FIFO All-to-All test) is adaptable. All multiprocessing/MPI/MNNVL/NVSHMEM tests are skipped as too complex.

Changes

  • : Add Paddle compat monkey-patches (para44-para48, para52)
  • : Add entry

Adaptation Points

para44/para45 - bfloat16/float16 isclose kernel not registered. Fix: catch and fall back to numpy-based allclose.

para52 (NEW) - Paddle compat wraps ALL internal errors with (not just bfloat16/float16). Affects float32 comparisons too. Fix: check this outer message first.

para46 - returns Tensor not bool. para47 - does not accept Python scalar. para48 - aliases missing.

Skipped Tests (too complex)

  • : missing (para23) + multiprocessing
  • , , : multiprocessing
  • : MPI-based
  • : MNNVL hardware. : NVSHMEM. : NCCL

Test Results

  • : 29 passed, 0 failed
  • Regression: all previous PASS cases still pass

Checklist

  • pre-commit passed
  • Target case PASS
  • No new regression

Refs: MISMATCH_EXPERIMENT para52

…le compat patch

- §44/§45: torch.testing.assert_close bfloat16/float16 isclose kernel
  not registered in Paddle compat
- §52: Paddle compat wraps ALL assert_close internal errors with
  "resulted in the unexpected exception above" (not just bfloat16/float16);
  fix: check this outer message first before dtype-specific conditions
- §46: torch.equal returns Tensor not bool in Paddle compat
- §47: tensor.multiply(scalar) does not accept Python scalar
- §48: tensor.clamp_min/clamp_max aliases missing

Skipped tests (multiprocessing/MPI/MNNVL/NVSHMEM — too complex):
test_all_gather_matmul.py, test_allreduce*.py, test_mixed_comm.py,
test_trtllm_allreduce*.py, test_mnnvl_*.py, test_nvshmem*.py,
test_vllm_custom_allreduce.py

Regression: all previous PASS cases still pass
Refs: MISMATCH_EXPERIMENT §52
@BingooYang BingooYang merged commit e18bf66 into PFCCLab:0.6 May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant