Add a16w8 reduce_sum FVP coverage for Ethos-U85 by Ninja91 · Pull Request #19319 · pytorch/executorch

Ninja91 · 2026-05-06T01:21:53Z

Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for aten.sum.dim_IntList reducing the last dim with keepdim=True. The new tests test_sum_dim_intlist_a16w8_{u55,u85}_INT run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela regor lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.

This diff is additive only: the Sum / SumDefault test classes and existing test functions are not modified, except for skips= annotations on the four pre-existing dim_None parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register ops/test_sum.py in the buck test target list.

Test design:

Standard pipeline.run() with the same a16w8 kwargs other arm a16w8 tests use (e.g. test_native_layer_norm_16a8w_u85_INT in test_layer_norm.py): a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16.
Numerical comparison is the standard atol/rtol check from pipeline.run() — no SQNR helpers.
The U85 cases are wrapped with xfails=a16w8_sum_u85_xfails, strict=False. strict=False keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
XfailIfNoCorstone320 is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id xfails= argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.

Differential Revision: D103667823

pytorch-bot · 2026-05-06T01:21:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 89 Pending

As of commit 876f542 with merge base 1debeb6 ():

NEW FAILURE - The following job has failed:

pull / test-llama-runner-linux-android / linux-job (gh)
Process completed with exit code 253.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-05-06T01:22:09Z

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103667823.

github-actions · 2026-05-06T01:23:12Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR adds Arm backend test coverage for the a16w8 (int16 activations / IO quantization) path of aten.sum.dim_IntList (reducing the last dim with keepdim=True) on Corstone FVPs, with the intent of surfacing a known Ethos-U85 ReduceSum int16 numerics issue (silent-zero output) while keeping the overall test target green via non-strict XFAILs.

Changes:

Enables ops/test_sum.py in the Arm Bazel test target list.
Adds new SumLastDim-based a16w8 ReduceSum tests for Ethos-U55 and Ethos-U85, including per-case XFAILs for the known U85 issue.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
backends/arm/test/targets.bzl	Adds `ops/test_sum.py` to the default Arm test file list so it runs in the Bazel test suite.
backends/arm/test/ops/test_sum.py	Introduces new a16w8 ReduceSum last-dim tests for U55/U85 and marks U85 cases as non-strict XFAIL to capture the known Vela issue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Summary: Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness and surface a numerics issue in the Ethos-U85 `ReduceSum` lowering at int16 IO precision (silent zero output). The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale. ## Context Part of a stack that documents and fixes a numerics bug in the Vela 5.0 Ethos-U85 backend (`regor`). Plan + cross-references: - **Plan:** {D103649006} ([Markup](https://internalfb.com/intern/markup/D103649006)) - **Step 1a (this diff):** ReduceSum-only a16w8 coverage in `test_sum.py` (LAND) - **Step 1b-softmax:** {D103734699} -- `test_softmax.py` a16w8 MHA softmax sweep (LAND) - **Step 1b-ops:** {D103760103} -- `test_softmax_ops.py` op-isolation harness (DNL) - **Step 2a:** {D103760153} -- `regor` patch in third-party Vela 5.0 fork (LAND) - **Step 2b:** {D103760514} -- DNL companion that drops `xfails=` from `test_sum.py` (lands in OSS only after upstream Vela syncs the fix) ## Test design Tests use the standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): ``` a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16 ``` Numerical comparison is the standard `atol`/`rtol`-only check from `pipeline.run()` -- no SQNR helpers -- to stay consistent with the rest of `arm/test/ops/`. The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) *and* after Step 2a lands the Vela patch (cases XPASS, allowed under non-strict). Step 2b separately drops the `xfails=` argument once the upstream Vela fix syncs down. The new U85 a16w8 test deliberately omits `common.XfailIfNoCorstone320` (which is present on the U55 sibling). Stacking that decorator with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically) so the bug-firing cases would hard-fail instead of XFAIL. CI always has Corstone-320 installed; if it ever isn't, the test fails loudly with `FileNotFoundError`, which is the right signal for a missing-FVP misconfiguration. A code comment in the file documents this constraint. ## Scope note This diff only **adds** new tests for the a16w8 path. It does not modify any existing tests in `test_sum.py` -- the pre-existing `Sum.test_parameters` (including the `dim_None` cases) is left as-is. Pre-existing `dim_None` test failures on `test_sum_u{55,85}_INT_1_0` are out of scope and unrelated to this diff. Differential Revision: D103667823

Summary: Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale. This diff is **additive only**: the `Sum` / `SumDefault` test classes and existing test functions are not modified, except for `skips=` annotations on the four pre-existing `dim_None` parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register `ops/test_sum.py` in the buck test target list. Test design: - Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`. - Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` — no SQNR helpers. - The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed). - `XfailIfNoCorstone320` is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint. Differential Revision: D103667823

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

+@common.parametrize("test_data", a16w8_sum_test_parameters)
+@common.XfailIfNoCorstone320
+@pytest.mark.xfail(
+    reason="Ethos-U85 int16 ReduceSum returns zero (vela#23)", strict=False
+)


Copilot AI review requested due to automatic review settings May 6, 2026 01:21

Ninja91 requested a review from digantdesai as a code owner May 6, 2026 01:21

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026

github-actions Bot added ciflow/trunk module: arm Issues related to arm backend and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels May 6, 2026

meta-codesync Bot added fb-exported meta-exported labels May 6, 2026

Ninja91 added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label May 6, 2026

Copilot started reviewing on behalf of Ninja91 May 6, 2026 01:22 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread backends/arm/test/ops/test_sum.py Outdated

Comment thread backends/arm/test/ops/test_sum.py Outdated

Comment thread backends/arm/test/ops/test_sum.py

Comment thread backends/arm/test/targets.bzl

Ninja91 requested a review from 3l1 May 6, 2026 01:35

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319) May 6, 2026

Ninja91 force-pushed the export-D103667823 branch from b4603d2 to 20105f6 Compare May 6, 2026 04:07

meta-codesync Bot changed the title ~~Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319)~~ Add a16w8 reduce_sum FVP coverage for Ethos-U85 May 6, 2026

Copilot AI review requested due to automatic review settings May 6, 2026 06:05

Ninja91 force-pushed the export-D103667823 branch from 20105f6 to 876f542 Compare May 6, 2026 06:05

Copilot started reviewing on behalf of Ninja91 May 6, 2026 06:05 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread backends/arm/test/ops/test_sum.py

Comment on lines +285 to +289

@common.parametrize("test_data", a16w8_sum_test_parameters)

@common.XfailIfNoCorstone320

@pytest.mark.xfail(

reason="Ethos-U85 int16 ReduceSum returns zero (vela#23)", strict=False

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319
Ninja91 wants to merge 1 commit intopytorch:mainfrom
Ninja91:export-D103667823

Ninja91 commented May 6, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ninja91 commented May 6, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319

❌ 1 New Failure, 89 Pending

Uh oh!

meta-codesync Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ninja91 commented May 6, 2026 •

edited by meta-codesync Bot

Loading

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

This PR needs a `release notes:` label