Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319
Add a16w8 reduce_sum FVP coverage for Ethos-U85#19319Ninja91 wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19319
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 89 PendingAs of commit 876f542 with merge base 1debeb6 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103667823. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adds Arm backend test coverage for the a16w8 (int16 activations / IO quantization) path of aten.sum.dim_IntList (reducing the last dim with keepdim=True) on Corstone FVPs, with the intent of surfacing a known Ethos-U85 ReduceSum int16 numerics issue (silent-zero output) while keeping the overall test target green via non-strict XFAILs.
Changes:
- Enables
ops/test_sum.pyin the Arm Bazel test target list. - Adds new
SumLastDim-based a16w8 ReduceSum tests for Ethos-U55 and Ethos-U85, including per-case XFAILs for the known U85 issue.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backends/arm/test/targets.bzl | Adds ops/test_sum.py to the default Arm test file list so it runs in the Bazel test suite. |
| backends/arm/test/ops/test_sum.py | Introduces new a16w8 ReduceSum last-dim tests for U55/U85 and marks U85 cases as non-strict XFAIL to capture the known Vela issue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness and surface a numerics issue in the Ethos-U85 `ReduceSum` lowering at int16 IO precision (silent zero output). The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.
## Context
Part of a stack that documents and fixes a numerics bug in the Vela 5.0 Ethos-U85 backend (`regor`). Plan + cross-references:
- **Plan:** {D103649006} ([Markup](https://internalfb.com/intern/markup/D103649006))
- **Step 1a (this diff):** ReduceSum-only a16w8 coverage in `test_sum.py` (LAND)
- **Step 1b-softmax:** {D103734699} -- `test_softmax.py` a16w8 MHA softmax sweep (LAND)
- **Step 1b-ops:** {D103760103} -- `test_softmax_ops.py` op-isolation harness (DNL)
- **Step 2a:** {D103760153} -- `regor` patch in third-party Vela 5.0 fork (LAND)
- **Step 2b:** {D103760514} -- DNL companion that drops `xfails=` from `test_sum.py` (lands in OSS only after upstream Vela syncs the fix)
## Test design
Tests use the standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`):
```
a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16
```
Numerical comparison is the standard `atol`/`rtol`-only check from `pipeline.run()` -- no SQNR helpers -- to stay consistent with the rest of `arm/test/ops/`.
The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) *and* after Step 2a lands the Vela patch (cases XPASS, allowed under non-strict). Step 2b separately drops the `xfails=` argument once the upstream Vela fix syncs down.
The new U85 a16w8 test deliberately omits `common.XfailIfNoCorstone320` (which is present on the U55 sibling). Stacking that decorator with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically) so the bug-firing cases would hard-fail instead of XFAIL. CI always has Corstone-320 installed; if it ever isn't, the test fails loudly with `FileNotFoundError`, which is the right signal for a missing-FVP misconfiguration. A code comment in the file documents this constraint.
## Scope note
This diff only **adds** new tests for the a16w8 path. It does not modify any existing tests in `test_sum.py` -- the pre-existing `Sum.test_parameters` (including the `dim_None` cases) is left as-is. Pre-existing `dim_None` test failures on `test_sum_u{55,85}_INT_1_0` are out of scope and unrelated to this diff.
Differential Revision: D103667823
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for `aten.sum.dim_IntList` reducing the last dim with `keepdim=True`. The new tests `test_sum_dim_intlist_a16w8_{u55,u85}_INT` run on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Vela `regor` lowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.
This diff is **additive only**: the `Sum` / `SumDefault` test classes and existing test functions are not modified, except for `skips=` annotations on the four pre-existing `dim_None` parametrize ids that are not bundled-program-serializable and surface only because this diff is the first to register `ops/test_sum.py` in the buck test target list.
Test design:
- Standard `pipeline.run()` with the same a16w8 kwargs other arm a16w8 tests use (e.g. `test_native_layer_norm_16a8w_u85_INT` in `test_layer_norm.py`): `a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16`.
- Numerical comparison is the standard `atol`/`rtol` check from `pipeline.run()` — no SQNR helpers.
- The U85 cases are wrapped with `xfails=a16w8_sum_u85_xfails, strict=False`. `strict=False` keeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).
- `XfailIfNoCorstone320` is intentionally omitted on the new a16w8 U85 test — stacking it with the per-id `xfails=` argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.
Differential Revision: D103667823
| @common.parametrize("test_data", a16w8_sum_test_parameters) | ||
| @common.XfailIfNoCorstone320 | ||
| @pytest.mark.xfail( | ||
| reason="Ethos-U85 int16 ReduceSum returns zero (vela#23)", strict=False | ||
| ) |
Summary:
Adds an a16w8 (int16 IO + int8 weights) sweep for
aten.sum.dim_IntListreducing the last dim withkeepdim=True. The new teststest_sum_dim_intlist_a16w8_{u55,u85}_INTrun on the standard Corstone-300 / Corstone-320 FVP harness. The U85 case surfaces a known numerics issue in the Velaregorlowering at int16 IO precision (silent zero output), tracked upstream at https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/issues/23. The Ethos-U55 path uses a different accumulator and is correct on the same OFM rescale.This diff is additive only: the
Sum/SumDefaulttest classes and existing test functions are not modified, except forskips=annotations on the four pre-existingdim_Noneparametrize ids that are not bundled-program-serializable and surface only because this diff is the first to registerops/test_sum.pyin the buck test target list.Test design:
pipeline.run()with the same a16w8 kwargs other arm a16w8 tests use (e.g.test_native_layer_norm_16a8w_u85_INTintest_layer_norm.py):a16w8_quantization=True, symmetric_io_quantization=True, qtol=128, epsilon=2**-16.atol/rtolcheck frompipeline.run()— no SQNR helpers.xfails=a16w8_sum_u85_xfails, strict=False.strict=Falsekeeps the test target green both on stock Vela 5.0 (cases XFAIL) and once the upstream Vela fix is in tree (cases XPASS allowed).XfailIfNoCorstone320is intentionally omitted on the new a16w8 U85 test — stacking it with the per-idxfails=argument makes the per-id marks not fire (verified empirically in this buck test target). A code comment in the file documents this constraint.Differential Revision: D103667823