[ExecuTorch][WebGPU] SDPA: skip QK contraction for fully-masked causal tiles by pytorchbot · Pull Request #20509 · pytorch/executorch

pytorchbot · 2026-06-25T06:43:41Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #20492 by @JulianCloudNTH
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/62/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/62/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/54/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/JulianCloudNTH/62/orig

@diff-train-skip-merge

…l tiles Pull Request resolved: #20492 **Skip the QK contraction for fully-masked causal tiles** — at S=128 prefill ~48% of the (query, key) tiles are entirely above the diagonal and contribute nothing; this elides their dot products (prefill-only; bit-identical output). **Problem**: For causal prefill, ~half the (query S-tile, key context-tile) pairs are entirely above the diagonal, yet the kernel still computes their full `d4` dot product before masking the result to `NEG_INF`. **Solution**: Skip the contraction for fully-masked tiles; the existing per-element mask still writes the sentinel: - **Before**: every `(s0, c0)` tile runs the full `d4` dot-product loop, then `store_qk` masks above-diagonal elements to `NEG_INF`. - **After**: a fully-masked tile (`c0 > s0 + TM-1 + input_pos`) breaks the `d4` loop immediately (`acc` stays 0); `store_qk` masks every element to `NEG_INF` exactly as before. **Implementation**: - Add `skip_tile = c0 > s0 + (TM - 1) + params.input_pos`, folded into the `d4` loop break condition. - Store loop unchanged — runs unconditionally, so no scratch entry is left stale. - Mirrors Vulkan `sdpa_compute_attn_weights_tiled.glsl` (`tile_in_mask_region`). **Constraints**: - No KV-cache, host, dispatch, or uniform change (all tiles still launch; the skip is in-shader). - Prefill-only: decode `S=1` never triggers it (`c0 <= input_pos < input_pos + TM - 1`). - `NEG_INF` stays the WGSL-safe `-1.0e30` (WGSL forbids a literal `-inf`); does not copy Vulkan's `-1.0/0.0`. Co-authored with Claude Code. ghstack-source-id: 396792509 @exported-using-ghexport Differential Revision: [D109517773](https://our.internmc.facebook.com/intern/diff/D109517773/)

pytorch-bot · 2026-06-25T06:43:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20509

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 2 Unrelated Failures

As of commit a850fa0 with merge base e03f777 ():

NEW FAILURE - The following job has failed:

pull / unittest-buck / linux / linux-job (gh)
RuntimeError: Command docker exec -t 0fb3ec48cf3fe1400e4c6b0031a6feeb50d637859b4df242695eff9f32ae4687 /exec failed with exit code 3

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)
pull / unittest-buck / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 3

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchbot temporarily deployed to cadence June 25, 2026 06:43 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ExecuTorch][WebGPU] SDPA: skip QK contraction for fully-masked causal tiles#20509

[ExecuTorch][WebGPU] SDPA: skip QK contraction for fully-masked causal tiles#20509
pytorchbot wants to merge 1 commit into
gh/JulianCloudNTH/54/origfrom
gh/JulianCloudNTH/62/orig

pytorchbot commented Jun 25, 2026

Uh oh!

pytorch-bot Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

pytorchbot commented Jun 25, 2026

Uh oh!

pytorch-bot Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20509

❗ 2 Active SEVs

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 25, 2026 •

edited

Loading