Skip to content

[paddle-adapt] gemm: adapt tests/gemm/test_group_gemm.py, test_mm_bf16.py, test_bmm_bf16.py#17

Open
BingooYang wants to merge 3 commits into
PFCCLab:0.6from
BingooYang:adapt/gemm_all
Open

[paddle-adapt] gemm: adapt tests/gemm/test_group_gemm.py, test_mm_bf16.py, test_bmm_bf16.py#17
BingooYang wants to merge 3 commits into
PFCCLab:0.6from
BingooYang:adapt/gemm_all

Conversation

@BingooYang
Copy link
Copy Markdown

📌 Description

Adapt three GEMM tests to run under paddle.enable_compat() mode.

Changes

  • Cherry-picked base adaptations from adapt/gemm_bf16 (§35 torch.device keyword arg fix, conftest.py patches)
  • tests/gemm/test_mm_bf16.py: torch.device(device="cuda") → torch.device("cuda") (§35)
  • tests/gemm/test_bmm_bf16.py: same fix (§35)
  • tests/gemm/test_group_gemm.py: no code changes needed - passes with sm80 backend as-is
  • Added 3 representative CI cases to scripts/paddle_all_test_cases.sh

Test Results

File PASS SKIP FAIL Notes
test_group_gemm.py (sm80) 288 36 0 sm90 SKIP: SM100 device does not support sm90 GEMM
test_mm_bf16.py (non-cudnn) 1081 3870 450 FAIL: §47 env issue (Multiple libcudart.so.12+.so.13 in cudnn/auto backend)
test_bmm_bf16.py (cutlass) 32 0 112 FAIL: same §47 env issue in auto+float32 cases

Known Non-Paddle Issues

  • §47: RuntimeError: Multiple libcudart libraries found — environment-level CUDA version conflict in cudnn and auto backends. Not fixable in Python.

🔍 Related Issues

Part of the Paddle compatibility adaptation series.

🚀 Pull Request Checklist

✅ Pre-commit Checks

  • pre-commit run --all-files: all checks passed

🧪 Tests

  • Tests added to scripts/paddle_all_test_cases.sh
  • Regression: norm PASS (102+35), comm PASS

Reviewer Notes

test_group_gemm.py required zero code changes — sm80 backend passes with base conftest.py patches alone. cudnn/auto failures are §47 env issue, unrelated to Paddle adaptation.

Your Name added 3 commits May 15, 2026 10:51
- Add paddle.enable_compat() and monkey-patches to tests/conftest.py:
  - Stream.cuda_stream property (paddle uses __cuda_stream__() returning tuple)
  - torch.cuda.current_blas_handle (paddle.cuda lacks this API)
- Fix torch.device(device=...) -> torch.device(...) across test files
- Add __is_paddle_compatible_library__ = True to flashinfer/__init__.py
- Add use_paddle_compatible_api() helper to flashinfer/utils.py
- Make flashinfer/triton imports optional (triton may not be available)
- Add _CudaOutOfMemoryError sentinel in flashinfer/autotuner.py
- Fix _get_cuda_stream() in cutlass/torch.py for paddle compat
- Rename package to flashinfer-python-paddle in pyproject.toml

Test results:
- test_group_gemm.py: 288 passed, 360 skipped
- test_mm_bf16.py: 1081 passed (cudnn/auto failures due to libcudart env conflict)
- test_bmm_bf16.py: 32 passed (cudnn/auto failures due to libcudart env conflict)

Known limitations (not adaptation issues):
- cudnn/auto backend: libcudart.so.12 vs .13 conflict (environment issue)
- res_dtype != bfloat16: paddle tensor copy between different dtypes not supported
…m_bf16 under paddle compat

- test_group_gemm.py: sm80 backend 288 PASS, 36 SKIP (batch_size*rows>8192); sm90 SKIP (SM100 device, no sm90 GEMM support); zero code changes needed
- test_mm_bf16.py: adapted via §35 fix (torch.device kwarg -> positional); cutlass/tgv/cublaslt/tinygemm backends pass; cudnn/auto-float32 FAIL due to §47 env issue (Multiple libcudart.so.12 vs .so.13)
- test_bmm_bf16.py: adapted via §35; cutlass backend pass; auto+float32 FAIL due to §47
- Regression: norm PASS (102+35 cases), comm PASS, cherry-picked base fixes from c11b6f55

Refs: adaptation-paddle/adaptation_exp.md §35 §47
- Replace try/import paddle with importlib.util.find_spec() in utils.py
- Apply ruff-format to 5 modified files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant