Skip to content

[Feature] Enable nd tiling workaround for dynamic shapes#35

Open
themistbeforedawn wants to merge 1 commit into
SandAI-org:mainfrom
themistbeforedawn:feat/enable-nd-tiling-workaround-for-dynamic-shapes
Open

[Feature] Enable nd tiling workaround for dynamic shapes#35
themistbeforedawn wants to merge 1 commit into
SandAI-org:mainfrom
themistbeforedawn:feat/enable-nd-tiling-workaround-for-dynamic-shapes

Conversation

@themistbeforedawn

Copy link
Copy Markdown
Collaborator

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

📝 Description

Problem

On PyTorch < 2.11.0, Inductor's coalesce tiling analysis bails out on symbolic
numels under dynamic shapes, degrading transpose/permute/channels-last pointwise
kernels to untiled Grid1D — a big hit on memory-bound workloads like VAE decode.

Fix

Auto-apply a Triton ND-tiling workaround for dynamic-shape compilation:
prefer_nd_tiling=True, max_tiles=3, tile_reductions=True.

Performance (WAN 2.2 VAE decode, 540p, dynamic H/W)

OFF ON Speedup
770 ms 530 ms ~1.45x

Gating

  • Dynamic shapes only — static shapes already tile well; forcing it on is a net loss.
  • Version gated — auto-disabled on PyTorch >= 2.11.0 (fixed upstream).
  • Overridecompile_config.enable_dynamic_nd_tiling (settable via the MAGI_COMPILE_ENABLE_DYNAMIC_ND_TILING env var) → auto.

Tests

  • Logic (28 cases): dynamic/version detection, explicit-config override precedence, env-var-drives-config, config injection.
  • Perf: weight-free VAE-decode-like workload (3D conv + upsample), dynamic H/W; ~1.35x ON vs OFF on H100 (asserts >= 1.20x on calibrated GPUs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant