fix deprecated paddle compat api usage by ShigureNyako · Pull Request #11 · PFCCLab/flashinfer

ShigureNyako · 2026-04-12T14:52:44Z

Summary

replace paddle.compat.enable_torch_proxy(scope={"flashinfer"}) in README.md with paddle.enable_compat(scope={"flashinfer"})
replace the remaining runtime calls in tests/conftest.py, tests/moe/test_trtllm_gen_fused_moe.py, and tests/attention/test_attention_sink_blackwell.py with paddle.enable_compat()
update the stale commented example in tests/comm/test_trtllm_allreduce_fusion_paddle.py

Verification

grep -R -nE "enable_torch_proxy|disable_torch_proxy" .
python3 -m py_compile tests/conftest.py tests/moe/test_trtllm_gen_fused_moe.py tests/attention/test_attention_sink_blackwell.py tests/comm/test_trtllm_allreduce_fusion_paddle.py

ShigureNyako · 2026-04-12T14:53:03Z

已自查本 PR：

仅替换命中的旧 compat API 用法/注释，未做其他改动
grep -R -nE "enable_torch_proxy|disable_torch_proxy" . 无残留
python3 -m py_compile tests/conftest.py tests/moe/test_trtllm_gen_fused_moe.py tests/attention/test_attention_sink_blackwell.py tests/comm/test_trtllm_allreduce_fusion_paddle.py 通过

@SigureMo 麻烦 review，感谢！

- flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size()) to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int but receives ffi.Tensor under paddle (doc item PFCCLab#11) - tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch` at conftest module level (outside flashinfer scope) also resolves via the proxy

- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"}) - in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(), replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification - flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback for paddle compat where paddle.torch_version is not exposed - flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to defer evaluation of "int | torch.device | str | None" annotation which fails under paddle proxy (torch.device is a CallableProxyModule, not a type) adapt prefill trtllm paged attention for paddle compat - flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size()) to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int but receives ffi.Tensor under paddle (doc item PFCCLab#11) - tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch` at conftest module level (outside flashinfer scope) also resolves via the proxy paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test support allreduce fusion dist.group.WORLD compat modify readme modify format fix env issue fix some issue paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test - flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add _DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot (fixes AttributeError when use_oneshot=None triggers the heuristic). - tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover [DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn children do not re-enter it under pytest (was double-initializing paddle TCPStore and SIGABRT in libuv). Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU. add adaptation paddle skill paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe `torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)` are fully aligned under `paddle.enable_compat()`. Revert the earlier paddle-specific detours (`torch.device.cuda.get_device_capability`, `paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs. Also record the finding in adaptation-paddle skill (§10, items 31-34) as a "do-not-over-adapt" reference for future MoE test reviews. Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes` passes (1 passed). paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches The previous adaptation commented out / trimmed ~1800 lines from upstream, making future rebases painful and dropping valid test coverage. Reset the file to exact upstream content (github.com/flashinfer-ai/flashinfer main) and keep only the minimum compat patches needed to run on paddle: test file patches: - add `import paddle; paddle.enable_compat()` at top - `block.aminmax()` -> `block.float().aminmax()` (paddle missing bf16 kernel) - fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel) - `expertLogits.cpu()` -> `.cpu().float()` (paddle missing cpu-bf16 topk) - `torch.random.manual_seed` -> `torch.manual_seed` (paddle.random lacks manual_seed) - `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg) same `torch.device(...)` kwarg fix in tests/moe/utils.py. library patch (flashinfer/autotuner.py): - `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder class (NOT `RuntimeError` - that would silently swallow real kernel errors). Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke` passes. Larger parametrized cases still need library-side fixes (e.g. `core.py::_init_packed_topk_ids` bitwise_or dtype mismatch). Docs (skills/adaptation-paddle): record new patches 31-36 and the "do-not-trim-upstream" lesson. paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`. Paddle's bitwise_or does not, so it raises ValueError: The type of data we are trying to retrieve (int16) does not match the type of data (int32) Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends. With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/ llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all progress past the dtype check. Remaining failures on this machine are infrastructure (cubin artifactory unreachable), not paddle-compat. modify skill fix some issues paddle compat: test_fused_rmsnorm_silu zero-patch adaptation tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat() with no source changes (conftest.py already enables compat). Full run: 102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing from paddle torch-proxy, not a kernel adaptation issue). - adp_test.md: add row 18 recording PASS 102/152 - adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch result, rationale, reproduction command, and the methodology recommendation (bare-run first, consult adaptation table only on failure). fix format

- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"}) - in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(), replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification - flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback for paddle compat where paddle.torch_version is not exposed - flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to defer evaluation of "int | torch.device | str | None" annotation which fails under paddle proxy (torch.device is a CallableProxyModule, not a type) adapt prefill trtllm paged attention for paddle compat - flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size()) to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int but receives ffi.Tensor under paddle (doc item PFCCLab#11) - tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch` at conftest module level (outside flashinfer scope) also resolves via the proxy paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test support allreduce fusion dist.group.WORLD compat modify readme modify format fix env issue fix some issue paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test - flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add _DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot (fixes AttributeError when use_oneshot=None triggers the heuristic). - tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover [DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn children do not re-enter it under pytest (was double-initializing paddle TCPStore and SIGABRT in libuv). Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU. add adaptation paddle skill paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe `torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)` are fully aligned under `paddle.enable_compat()`. Revert the earlier paddle-specific detours (`torch.device.cuda.get_device_capability`, `paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs. Also record the finding in adaptation-paddle skill (§10, items 31-34) as a "do-not-over-adapt" reference for future MoE test reviews. Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes` passes (1 passed). paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches The previous adaptation commented out / trimmed ~1800 lines from upstream, making future rebases painful and dropping valid test coverage. Reset the file to exact upstream content (github.com/flashinfer-ai/flashinfer main) and keep only the minimum compat patches needed to run on paddle: test file patches: - add `import paddle; paddle.enable_compat()` at top - `block.aminmax()` -> `block.float().aminmax()` (paddle missing bf16 kernel) - fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel) - `expertLogits.cpu()` -> `.cpu().float()` (paddle missing cpu-bf16 topk) - `torch.random.manual_seed` -> `torch.manual_seed` (paddle.random lacks manual_seed) - `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg) same `torch.device(...)` kwarg fix in tests/moe/utils.py. library patch (flashinfer/autotuner.py): - `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder class (NOT `RuntimeError` - that would silently swallow real kernel errors). Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke` passes. Larger parametrized cases still need library-side fixes (e.g. `core.py::_init_packed_topk_ids` bitwise_or dtype mismatch). Docs (skills/adaptation-paddle): record new patches 31-36 and the "do-not-trim-upstream" lesson. paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`. Paddle's bitwise_or does not, so it raises ValueError: The type of data we are trying to retrieve (int16) does not match the type of data (int32) Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends. With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/ llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all progress past the dtype check. Remaining failures on this machine are infrastructure (cubin artifactory unreachable), not paddle-compat. modify skill fix some issues paddle compat: test_fused_rmsnorm_silu zero-patch adaptation tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat() with no source changes (conftest.py already enables compat). Full run: 102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing from paddle torch-proxy, not a kernel adaptation issue). - adp_test.md: add row 18 recording PASS 102/152 - adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch result, rationale, reproduction command, and the methodology recommendation (bare-run first, consult adaptation table only on failure). fix format fix some issue

fix deprecated paddle compat api usage

3393f4c

SigureMo approved these changes Apr 12, 2026

View reviewed changes

SigureMo merged commit d2f29f7 into PFCCLab:paddle Apr 12, 2026
1 check passed

SigureMo mentioned this pull request Apr 12, 2026

replace deprecated paddle compat torch proxy alias #12

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix deprecated paddle compat api usage#11

fix deprecated paddle compat api usage#11
SigureMo merged 1 commit into
PFCCLab:paddlefrom
ShigureNyako:cleanup/replace-enable-torch-proxy-with-enable-compat

ShigureNyako commented Apr 12, 2026

Uh oh!

ShigureNyako commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ShigureNyako commented Apr 12, 2026

Summary

Verification

Uh oh!

ShigureNyako commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants