Fix sliding-window chunked prefill in the gemma4-31B runner by digantdesai · Pull Request #20346 · pytorch/executorch

digantdesai · 2026-06-17T19:02:39Z

The runner chunked prefill at get_max_prefill_chunk = 2sliding_window (2048).
A chunk larger than the window overflows the 2window ring KV cache across
chunk boundaries: after writing a 2048-token chunk the ring holds only the most
recent 2048 positions, so the first ~(chunk - window) queries of every chunk
after the first lose the tail of the previous chunk that is still inside their
1024 window. Those sliding-layer queries then attend over a truncated window,
which propagates into their hidden states and the global KV those positions
write, changing the output. The global flat-cache layers are unaffected.

Cap the prefill chunk at the sliding window: get_sliding_window from metadata
(now exported), else max_prefill/2 since the export sets max_prefill =
2*sliding_window. Decode is unaffected. Adds --max_prefill_chunk to override
the chunk size for testing.

Authored with assistance from Claude Code.

[ghstack-poisoned]

digantdesai · 2026-06-17T19:02:40Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-06-17T19:02:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20346

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Unrelated Failure, 2 Unclassified Failures

As of commit 6415edb with merge base da9158b ():

NEW FAILURE - The following job has failed:

pull / unittest-editable / windows / windows-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Build Aarch64 Linux Wheels / pytorch/executorch / build-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
/__w/executorch/executorch/pytorch/executorch/backends/apple/coreml/runtime/inmemoryfs/inmemory_filesystem.cpp:722:48: error: ‘inmemoryfs::InMemoryFileSystem::InMemoryNode::Kind’ has not been declared
Build Aarch64 Linux Wheels / pytorch/executorch / upload / upload-wheel-py3_10-cpu-aarch64 (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
Unable to download artifact(s): Artifact not found for name: pytorch_executorch__3.10_cpu_aarch64

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Update

5970c44

[ghstack-poisoned]

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026

digantdesai marked this pull request as ready for review June 23, 2026 19:23

Update

6415edb

[ghstack-poisoned]

digantdesai had a problem deploying to cadence June 25, 2026 05:11 — with GitHub Actions Error

digantdesai temporarily deployed to cadence June 25, 2026 05:12 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix sliding-window chunked prefill in the gemma4-31B runner#20346

Fix sliding-window chunked prefill in the gemma4-31B runner#20346
digantdesai wants to merge 2 commits into
gh/digantdesai/64/headfrom
gh/digantdesai/65/head

digantdesai commented Jun 17, 2026

Uh oh!

digantdesai commented Jun 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

digantdesai commented Jun 17, 2026

Uh oh!

digantdesai commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20346

❗ 2 Active SEVs

❌ 1 New Failure, 1 Unrelated Failure, 2 Unclassified Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

digantdesai commented Jun 17, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading