Skip to content

adapt(norm): adapt tests/norm/ for Paddle compat#16

Merged
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:adapt/norm_all
May 14, 2026
Merged

adapt(norm): adapt tests/norm/ for Paddle compat#16
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:adapt/norm_all

Conversation

@BingooYang
Copy link
Copy Markdown

📌 Description

Adapt tests/norm/ for Paddle compat. All 4 test files handled:

Test file Result Change
test_fused_dit_layernorm.py ✅ 35 passed Fix strided chunk + as_strided byte offset
test_fused_rmsnorm_silu.py ✅ 102 passed, 50 skipped No change needed
test_rmsnorm_fp4_quant_cute_dsl.py ⏭ module-level skip NVFP4 requires PyTorch 2.6+
test_add_rmsnorm_fp4_quant_cute_dsl.py ⏭ module-level skip Same as above

3 key fixes:

  1. Paddle chunk() returns contiguous copies (loses strides).
    PyTorch chunk(6, dim=2) returns strided views (row stride=6×H); Paddle returns contiguous copies (row stride=H).
    Fix: _chunk_strided() helper using torch.as_strided to reconstruct the correct stride.

  2. Paddle as_strided storage_offset is in BYTES (not elements) — P0 silent data corruption.
    Fix: storage_offset = chunk_idx * hidden_dim * temb.element_size()

  3. pytest.skip(allow_module_level=True) required for module-level skip of NVFP4 tests.
    pytestmark = pytest.mark.skip(...) does NOT prevent collection (2195 tests collected vs 0 with fix).

🔍 Related Issues

N/A

🚀 Pull Request Checklist

  • pre-commit checks pass
  • All adapted tests pass or are intentionally skipped with clear reason
  • scripts/paddle_all_test_cases.sh updated
  • adaptation_exp.md updated (§40–44, Section 十二)

🧪 Tests

tests/norm/test_fused_dit_layernorm.py           35 passed
tests/norm/test_fused_rmsnorm_silu.py            102 passed, 50 skipped
tests/norm/test_rmsnorm_fp4_quant_cute_dsl.py    SKIPPED (module-level)
tests/norm/test_add_rmsnorm_fp4_quant_cute_dsl.py SKIPPED (module-level)

Reviewer Notes

  • _chunk_strided in test_fused_dit_layernorm.py is the key fix; the byte-offset behaviour of Paddle as_strided differs from PyTorch and causes silent data corruption if not accounted for.
  • FP4 tests are skipped because torch.float4_e2m1fn_x2 (NVFP4 packed dtype) is only available in PyTorch 2.6+; the current Paddle compat environment ships an earlier version.

- test_fused_dit_layernorm.py: add _chunk_strided() helper using
  torch.as_strided to reconstruct correct stride from 4D temb tensor.
  Paddle chunk() returns contiguous copies (losing strides); kernel
  requires gate.stride(1)==6*hidden_dim. Offset uses byte units
  (Paddle as_strided storage_offset is in bytes, PyTorch in elements).
  Fix _make_strided_gate to use _chunk_strided instead of chunk().

- test_rmsnorm_fp4_quant_cute_dsl.py,
  test_add_rmsnorm_fp4_quant_cute_dsl.py: add module-level skip guard
  for torch.float4_e2m1fn_x2 (NVFP4 packed dtype, PyTorch 2.6+,
  not proxied in Paddle compat). Use pytest.skip(allow_module_level=True).

- scripts/paddle_all_test_cases.sh: add test_fused_dit_layernorm.py;
  add comments for fp4 tests (skipped, unavailable dtype).

Results:
  test_fused_rmsnorm_silu.py:     102 passed, 50 skipped
  test_fused_dit_layernorm.py:    35 passed
  fp4 tests:                       2 skipped (dtype unavailable)
@BingooYang BingooYang merged commit b6fae6e into PFCCLab:0.6 May 14, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant