Skip to content

fix deprecated paddle compat api usage#11

Merged
SigureMo merged 1 commit into
PFCCLab:paddlefrom
ShigureNyako:cleanup/replace-enable-torch-proxy-with-enable-compat
Apr 12, 2026
Merged

fix deprecated paddle compat api usage#11
SigureMo merged 1 commit into
PFCCLab:paddlefrom
ShigureNyako:cleanup/replace-enable-torch-proxy-with-enable-compat

Conversation

@ShigureNyako
Copy link
Copy Markdown

Summary

  • replace paddle.compat.enable_torch_proxy(scope={"flashinfer"}) in README.md with paddle.enable_compat(scope={"flashinfer"})
  • replace the remaining runtime calls in tests/conftest.py, tests/moe/test_trtllm_gen_fused_moe.py, and tests/attention/test_attention_sink_blackwell.py with paddle.enable_compat()
  • update the stale commented example in tests/comm/test_trtllm_allreduce_fusion_paddle.py

Verification

  • grep -R -nE "enable_torch_proxy|disable_torch_proxy" .
  • python3 -m py_compile tests/conftest.py tests/moe/test_trtllm_gen_fused_moe.py tests/attention/test_attention_sink_blackwell.py tests/comm/test_trtllm_allreduce_fusion_paddle.py

@ShigureNyako
Copy link
Copy Markdown
Author

已自查本 PR:

  • 仅替换命中的旧 compat API 用法/注释,未做其他改动
  • grep -R -nE "enable_torch_proxy|disable_torch_proxy" . 无残留
  • python3 -m py_compile tests/conftest.py tests/moe/test_trtllm_gen_fused_moe.py tests/attention/test_attention_sink_blackwell.py tests/comm/test_trtllm_allreduce_fusion_paddle.py 通过

@SigureMo 麻烦 review,感谢!

@SigureMo SigureMo merged commit d2f29f7 into PFCCLab:paddle Apr 12, 2026
1 check passed
BingooYang added a commit to BingooYang/flashinfer that referenced this pull request May 13, 2026
- flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size())
  to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int
  but receives ffi.Tensor under paddle (doc item PFCCLab#11)
- tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch`
  at conftest module level (outside flashinfer scope) also resolves via the proxy
BingooYang added a commit to BingooYang/flashinfer that referenced this pull request May 13, 2026
- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"})
- in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(),
  replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with
  numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification
- flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback
  for paddle compat where paddle.torch_version is not exposed
- flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to
  defer evaluation of "int | torch.device | str | None" annotation which fails under
  paddle proxy (torch.device is a CallableProxyModule, not a type)

adapt prefill trtllm paged attention for paddle compat

- flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size())
  to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int
  but receives ffi.Tensor under paddle (doc item PFCCLab#11)
- tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch`
  at conftest module level (outside flashinfer scope) also resolves via the proxy

paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test

support allreduce fusion

dist.group.WORLD compat

modify readme

modify format

fix env issue

fix some issue

paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test

- flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add
  _DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot
  (fixes AttributeError when use_oneshot=None triggers the heuristic).
- tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize
  scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover
  [DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn
  children do not re-enter it under pytest (was double-initializing
  paddle TCPStore and SIGABRT in libuv).

Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU.

add adaptation paddle skill

paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe

`torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)`
are fully aligned under `paddle.enable_compat()`. Revert the earlier
paddle-specific detours (`torch.device.cuda.get_device_capability`,
`paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs.

Also record the finding in adaptation-paddle skill (§10, items 31-34) as a
"do-not-over-adapt" reference for future MoE test reviews.

Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes`
passes (1 passed).

paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches

The previous adaptation commented out / trimmed ~1800 lines from upstream,
making future rebases painful and dropping valid test coverage. Reset the
file to exact upstream content (github.com/flashinfer-ai/flashinfer main)
and keep only the minimum compat patches needed to run on paddle:

test file patches:
- add `import paddle; paddle.enable_compat()` at top
- `block.aminmax()` -> `block.float().aminmax()`       (paddle missing bf16 kernel)
- fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel)
- `expertLogits.cpu()` -> `.cpu().float()`             (paddle missing cpu-bf16 topk)
- `torch.random.manual_seed` -> `torch.manual_seed`     (paddle.random lacks manual_seed)
- `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg)

same `torch.device(...)` kwarg fix in tests/moe/utils.py.

library patch (flashinfer/autotuner.py):
- `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder
  class (NOT `RuntimeError` - that would silently swallow real kernel errors).

Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke`
passes. Larger parametrized cases still need library-side fixes (e.g.
`core.py::_init_packed_topk_ids` bitwise_or dtype mismatch).

Docs (skills/adaptation-paddle): record new patches 31-36 and the
"do-not-trim-upstream" lesson.

paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids

torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`.
Paddle's bitwise_or does not, so it raises

  ValueError: The type of data we are trying to retrieve (int16) does not
  match the type of data (int32)

Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends.

With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/
llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all
progress past the dtype check. Remaining failures on this machine are
infrastructure (cubin artifactory unreachable), not paddle-compat.

modify skill

fix some issues

paddle compat: test_fused_rmsnorm_silu zero-patch adaptation

tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat()
with no source changes (conftest.py already enables compat). Full run:
102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing
from paddle torch-proxy, not a kernel adaptation issue).

- adp_test.md: add row 18 recording PASS 102/152
- adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch
  result, rationale, reproduction command, and the methodology
  recommendation (bare-run first, consult adaptation table only on
  failure).

fix format
BingooYang added a commit to BingooYang/flashinfer that referenced this pull request May 13, 2026
- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"})
- in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(),
  replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with
  numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification
- flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback
  for paddle compat where paddle.torch_version is not exposed
- flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to
  defer evaluation of "int | torch.device | str | None" annotation which fails under
  paddle proxy (torch.device is a CallableProxyModule, not a type)

adapt prefill trtllm paged attention for paddle compat

- flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size())
  to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int
  but receives ffi.Tensor under paddle (doc item PFCCLab#11)
- tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch`
  at conftest module level (outside flashinfer scope) also resolves via the proxy

paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test

support allreduce fusion

dist.group.WORLD compat

modify readme

modify format

fix env issue

fix some issue

paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test

- flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add
  _DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot
  (fixes AttributeError when use_oneshot=None triggers the heuristic).
- tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize
  scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover
  [DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn
  children do not re-enter it under pytest (was double-initializing
  paddle TCPStore and SIGABRT in libuv).

Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU.

add adaptation paddle skill

paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe

`torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)`
are fully aligned under `paddle.enable_compat()`. Revert the earlier
paddle-specific detours (`torch.device.cuda.get_device_capability`,
`paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs.

Also record the finding in adaptation-paddle skill (§10, items 31-34) as a
"do-not-over-adapt" reference for future MoE test reviews.

Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes`
passes (1 passed).

paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches

The previous adaptation commented out / trimmed ~1800 lines from upstream,
making future rebases painful and dropping valid test coverage. Reset the
file to exact upstream content (github.com/flashinfer-ai/flashinfer main)
and keep only the minimum compat patches needed to run on paddle:

test file patches:
- add `import paddle; paddle.enable_compat()` at top
- `block.aminmax()` -> `block.float().aminmax()`       (paddle missing bf16 kernel)
- fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel)
- `expertLogits.cpu()` -> `.cpu().float()`             (paddle missing cpu-bf16 topk)
- `torch.random.manual_seed` -> `torch.manual_seed`     (paddle.random lacks manual_seed)
- `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg)

same `torch.device(...)` kwarg fix in tests/moe/utils.py.

library patch (flashinfer/autotuner.py):
- `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder
  class (NOT `RuntimeError` - that would silently swallow real kernel errors).

Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke`
passes. Larger parametrized cases still need library-side fixes (e.g.
`core.py::_init_packed_topk_ids` bitwise_or dtype mismatch).

Docs (skills/adaptation-paddle): record new patches 31-36 and the
"do-not-trim-upstream" lesson.

paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids

torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`.
Paddle's bitwise_or does not, so it raises

  ValueError: The type of data we are trying to retrieve (int16) does not
  match the type of data (int32)

Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends.

With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/
llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all
progress past the dtype check. Remaining failures on this machine are
infrastructure (cubin artifactory unreachable), not paddle-compat.

modify skill

fix some issues

paddle compat: test_fused_rmsnorm_silu zero-patch adaptation

tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat()
with no source changes (conftest.py already enables compat). Full run:
102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing
from paddle torch-proxy, not a kernel adaptation issue).

- adp_test.md: add row 18 recording PASS 102/152
- adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch
  result, rationale, reproduction command, and the methodology
  recommendation (bare-run first, consult adaptation table only on
  failure).

fix format

fix some issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants