Skip to content

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319

Open
Ninja91 wants to merge 1 commit intopytorch:mainfrom
Ninja91:export-D103667823
Open

Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319
Ninja91 wants to merge 1 commit intopytorch:mainfrom
Ninja91:export-D103667823

Conversation

@Ninja91
Copy link
Copy Markdown
Contributor

@Ninja91 Ninja91 commented May 6, 2026

Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for aten.sum.dim_IntList reducing the last dim with keepdim=True. The new tests test_sum_dim_intlist_a16w8_{u55,u85}_INT run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela regor lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.

This diff is additive only: the Sum / SumDefault test classes and existing test functions are not modified, except for skips= annotations on the four pre-existing dim_None parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register ops/test_sum.py in the buck test target list.

Test design:

  • Standard pipeline.run() with the same a16w8 kwargs other arm a16w8 tests use (e.g. test_native_layer_norm_16a8w_u85_INT in test_layer_norm.py): a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16.
  • Numerical comparison is the standard atol/rtol check from pipeline.run() — no SQNR helpers.
  • The U85 cases are wrapped with xfails=a16w8_sum_u85_xfails, strict=False. strict=False keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
  • XfailIfNoCorstone320 is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id xfails= argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.

Differential Revision: D103667823

Copilot AI review requested due to automatic review settings May 6, 2026 01:21
@Ninja91 Ninja91 requested a review from digantdesai as a code owner May 6, 2026 01:21
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 89 Pending

As of commit 876f542 with merge base 1debeb6 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels May 6, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 6, 2026

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103667823.

@Ninja91 Ninja91 added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Arm backend test coverage for the a16w8 (int16 activations / IO quantization) path of aten.sum.dim_IntList (reducing the last dim with keepdim=True) on Corstone FVPs, with the intent of surfacing a known Ethos-U85 ReduceSum int16 numerics issue (silent-zero output) while keeping the overall test target green via non-strict XFAILs.

Changes:

  • Enables ops/test_sum.py in the Arm Bazel test target list.
  • Adds new SumLastDim-based a16w8 ReduceSum tests for Ethos-U55 and Ethos-U85, including per-case XFAILs for the known U85 issue.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
backends/arm/test/targets.bzl Adds ops/test_sum.py to the default Arm test file list so it runs in the Bazel test suite.
backends/arm/test/ops/test_sum.py Introduces new a16w8 ReduceSum last-dim tests for U55/U85 and marks U85 cases as non-strict XFAIL to capture the known Vela issue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backends/arm/test/ops/test_sum.py Outdated
Comment thread backends/arm/test/ops/test_sum.py Outdated
Comment thread backends/arm/test/ops/test_sum.py
Comment thread backends/arm/test/targets.bzl
@Ninja91 Ninja91 requested a review from 3l1 May 6, 2026 01:35
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 6, 2026
@meta-codesync meta-codesync Bot changed the title Add a16w8 reduce_sum FVP coverage for Ethos-U85 Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319) May 6, 2026
Ninja91 added a commit to Ninja91/executorch that referenced this pull request May 6, 2026
Summary:

Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness and surface a numerics issue in the Ethos-U85 `ReduceSum` lowering at int16 IO precision (silent zero output). The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.

## Context

Part of a stack that documents and fixes a numerics bug in the Vela 5.0 Ethos-U85 backend (`regor`). Plan + cross-references:

- **Plan:** {D103649006} ([Markup](https://internalfb.com/intern/markup/D103649006))
- **Step 1a (this diff):** ReduceSum-only a16w8 coverage in `test_sum.py` (LAND)
- **Step 1b-softmax:** {D103734699} -- `test_softmax.py` a16w8 MHA softmax sweep (LAND)
- **Step 1b-ops:** {D103760103} -- `test_softmax_ops.py` op-isolation harness (DNL)
- **Step 2a:** {D103760153} -- `regor` patch in third-party Vela 5.0 fork (LAND)
- **Step 2b:** {D103760514} -- DNL companion that drops `xfails=` from `test_sum.py` (lands in OSS only after upstream Vela syncs the fix)

## Test design

Tests use the standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`):

```
a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16
```

Numerical comparison is the standard `atol`/`rtol`-only check from `pipeline.run()` -- no SQNR helpers -- to stay consistent with the rest of `arm/test/ops/`.

The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) *and* after Step 2a lands the Vela patch (cases XPASS, allowed under non-strict). Step 2b separately drops the `xfails=` argument once the upstream Vela fix syncs down.

The new U85 a16w8 test deliberately omits `common.XfailIfNoCorstone320` (which is present on the U55 sibling). Stacking that decorator with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically) so the bug-firing cases would hard-fail instead of XFAIL. CI always has Corstone-320 installed; if it ever isn't, the test fails loudly with `FileNotFoundError`, which is the right signal for a missing-FVP misconfiguration. A code comment in the file documents this constraint.

## Scope note

This diff only **adds** new tests for the a16w8 path. It does not modify any existing tests in `test_sum.py` -- the pre-existing `Sum.test_parameters` (including the `dim_None` cases) is left as-is. Pre-existing `dim_None` test failures on `test_sum_u{55,85}_INT_1_0` are out of scope and unrelated to this diff.

Differential Revision: D103667823
@Ninja91 Ninja91 force-pushed the export-D103667823 branch from b4603d2 to 20105f6 Compare May 6, 2026 04:07
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.

This diff is **additive only**: the `Sum` / `SumDefault` test classes and existing test functions are not modified, except for `skips=` annotations on the four pre-existing `dim_None` parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register `ops/test_sum.py` in the buck test target list.

Test design:

- Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`):  `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`.
- Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` — no SQNR helpers.
- The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
- `XfailIfNoCorstone320` is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.

Differential Revision: D103667823
@meta-codesync meta-codesync Bot changed the title Add a16w8 reduce_sum FVP coverage for Ethos-U85 (#19319) Add a16w8 reduce_sum FVP coverage for Ethos-U85 May 6, 2026
Copilot AI review requested due to automatic review settings May 6, 2026 06:05
@Ninja91 Ninja91 force-pushed the export-D103667823 branch from 20105f6 to 876f542 Compare May 6, 2026 06:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment on lines +285 to +289
@common.parametrize("test_data", a16w8_sum_test_parameters)
@common.XfailIfNoCorstone320
@pytest.mark.xfail(
reason="Ethos-U85 int16 ReduceSum returns zero (vela#23)", strict=False
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: arm Issues related to arm backend partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants