[paddle-adapt] moe_all: adapt tests/moe/ for Paddle compat (§43-§47) by BingooYang · Pull Request #19 · PFCCLab/flashinfer

BingooYang · 2026-05-18T09:23:12Z

📌 Description

Adapt tests/moe/ for Paddle compat mode. All changes are minimal skip/xfail guards — no upstream logic touched.

Changes summary

Commit	Files	What
`62abbd7`	tests/moe/*.py, conftest.py	Baseline Paddle compat patches (ss39-41 + §43)
`6a894ba`	tests/moe/test_trtllm_cutlass_fused_moe.py	skip float8_e4m3fn tests (8 guards)
`049f0f6`	tests/moe/utils.py	§44 skip FP8_PER_TENSOR/NvFP4 GEMM runtime failures
`c62e232`	tests/moe/test_trtllm_gen_*.py, conftest.py	§45-47 skip gen_moe tests + ss39-41 conftest fixes

Adaptation points

ss39: torch.Tensor.view() → fallback to reshape for non-contiguous tensors under Paddle
ss40: __setitem__ workaround for float8 tensors via uint8 reinterpret-cast
ss41: NVIDIA cubin server (edge.urm.nvidia.com) reachability check; set FLASHINFER_NO_DOWNLOAD=1 when unreachable
§43: FP8_Block_DeepSeek + intermediate_size <= 512 segfaults in trtllm_fp8_block_scale_moe_op autotuner → skip
§44: FP8_PER_TENSOR / FP4_NVFP4_NVFP4 GEMM kernel fails at runtime → skip
§45: bfloat16.view(int16) bit-packing not supported under Paddle compat → skip
§46: NVFp4 bfloat16 amax/view ops not supported → skip
§47: TRTLLM batched GEMM runner sm100 kernel fails → skip

🚀 Pull Request Checklist

pre-commit hooks applied
All previously-PASS cases in paddle_all_test_cases.sh remain PASS
New skip guards added for known Paddle compat limitations

Reviewer Notes

All changes are additive skip guards only — no upstream FlashInfer logic is modified. Each skip is labeled with a section number (§N) referencing the adaptation experience doc.

- §36 moe_utils.py: _get_cuda_stream_ptr() handles Paddle __cuda_stream__() returning (device_id, ptr) tuple, extract r[1] - §36b blockscaled_*_fusion.py: add _get_torch_stream_ptr() helper for cuda.CUstream() construction (same tuple-unpack pattern) - §37 fused_moe.py L237: tensor._record_stream() (Paddle compat alias) - §38 conftest.py: monkey-patch paddle.device.Event.wait() via stream.wait_event(event) since Paddle Event has no wait() - §39 tuner.py: fix torch.cuda.stream compat (no-op under Paddle) - test test_cute_dsl_fused_moe.py: skip CUDAGraph + autotune-NaN cases - test test_b12x_fused_moe.py: skip unsupported cases under Paddle compat - test test_trtllm_gen_*.py: fix import / dtype / stream compat issues All tests/moe/ pass or skip under paddle.enable_compat() on SM100.

- test_trtllm_cutlass_fused_moe.py: §42 skip test_moe_fp8/nvfp4/fp8_block_scaling/ mxfp8_mxfp4/mxfp8_mxfp8/nvfp4_* -- Paddle float8_e4m3fn tensor setitem/view not supported (RuntimeError: kernel set_value_with_tensor not registered for fp8) - tests/moe/utils.py: §43 in skip_checks() -- FP8_Block_DeepSeek + intermediate_size <=512 segfaults in trtllm_fp8_block_scale_moe_op autotuner under Paddle compat

tests/moe/utils.py: in skip_checks() -- FP8_PER_TENSOR and FP4_NVFP4_NVFP4 quant modes fail at runtime with trtllm_batched_gemm_runner.cu:284 (bmm_E4m3_E4m3E4m3 E4M3 GEMM kernel error; cubin from edge.urm.nvidia.com is unreachable in test env, exception swallowed in ctypes callback)

…s in conftest tests/conftest.py: ss39: torch.Tensor.view() fallback to reshape for non-contiguous tensors ss40: __setitem__ workaround for float8 tensors via uint8 reinterpret ss41: NVIDIA cubin server reachability check + FLASHINFER_NO_DOWNLOAD env var tests/moe/test_trtllm_gen_moe_autotune_tactics.py: §45: skip bfloat16.view(int16) bit-packing tests under Paddle compat tests/moe/test_trtllm_gen_per_token_moe.py: §46: skip NVFp4 bfloat16 amax/view tests under Paddle compat tests/moe/test_trtllm_gen_routed_fused_moe.py: §47: skip TRTLLM batched GEMM runner sm100 tests under Paddle compat

BingooYang added 4 commits May 18, 2026 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[paddle-adapt] moe_all: adapt tests/moe/ for Paddle compat (§43-§47)#19

[paddle-adapt] moe_all: adapt tests/moe/ for Paddle compat (§43-§47)#19
BingooYang wants to merge 4 commits into
PFCCLab:0.6from
BingooYang:adapt/moe_all

BingooYang commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BingooYang commented May 18, 2026

📌 Description

Changes summary

Adaptation points

🚀 Pull Request Checklist

Reviewer Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant