Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations by Ninja91 · Pull Request #18001 · pytorch/executorch

Ninja91 · 2026-03-09T00:17:50Z

Summary:
Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies.

Context

The previous commit removed extern "C" wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (arm_maximum_s8, arm_minimum_s8, arm_pad_s8, arm_transpose_s8, arm_transpose_conv_wrapper_s8) with equivalent C++ reference loops.

Changes:

Validation API cleanup in cortex_m_ops_common.h:
- Removed unused zero_point params from validate_single_quant_params and validate_quantization_params
- Removed unused require_channels_last param from validate_cmsis_nn_tensor_requirements
- Replaced C-style arrays with std::array in is_channels_last_tensor and resize_to_broadcast_target_size
- Added braces to bare if-statement
Op file validate call updates: Updated all callers of the simplified validation APIs
CMSIS-NN → reference implementations: Replaced arm_maximum_s8, arm_minimum_s8, arm_pad_s8, arm_transpose_s8, and arm_transpose_conv_wrapper_s8 with reference C++ loops
Dead code removal: Removed unused kChannelsLastDimOrder in op_quantized_conv2d.cpp

Differential Revision: D95741568

pytorch-bot · 2026-03-09T00:17:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18001

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 46fbb3a with merge base c385ed0 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner (gh)
>>> Lint for backends/cortex_m/ops/op_transpose.cpp:
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 66ec94da4bc897584af73c723ac5a7f304976d64615910b4d475b47a85eaeb2f /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t d5eba30d44c40c69f958fb0f06585b43f3a7334cfe0c4f04ea42ba220bcd2413 /exec failed with exit code 1
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-03-09T00:18:14Z

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95741568.

github-actions · 2026-03-09T00:18:51Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR refactors Cortex-M operator validation helpers and replaces several CMSIS-NN kernel calls with portable reference implementations, aiming to simplify APIs and reduce reliance on CMSIS-NN for select elementwise / layout ops.

Changes:

Simplifies validation helper APIs in cortex_m_ops_common.h and updates call sites accordingly.
Replaces CMSIS-NN implementations for transpose/pad/min/max and transpose-conv wrapper usage with C++ reference loops.
Removes now-unused CMSIS include blocks and dead code related to channels-last ordering.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
backends/cortex_m/ops/op_transpose.cpp	Replaces `arm_transpose_s8` with a reference transpose loop and removes CMSIS includes.
backends/cortex_m/ops/op_softmax.cpp	Updates quant param validation call after API refactor; removes CMSIS include block.
backends/cortex_m/ops/op_quantized_transpose_conv2d.cpp	Replaces CMSIS transpose-conv wrapper with a reference implementation and removes scratch-buffer usage.
backends/cortex_m/ops/op_quantized_mul.cpp	Removes CMSIS include block and updates validation call signature.
backends/cortex_m/ops/op_quantized_max_pool2d.cpp	Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_linear.cpp	Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_depthwise_conv2d.cpp	Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_conv2d.cpp	Removes CMSIS include block and deletes unused channels-last dim order constant.
backends/cortex_m/ops/op_quantized_avg_pool2d.cpp	Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_add.cpp	Removes CMSIS include block and updates validation call signature.
backends/cortex_m/ops/op_pad.cpp	Replaces `arm_pad_s8` with a reference padding loop and removes CMSIS include block.
backends/cortex_m/ops/op_minimum.cpp	Replaces `arm_minimum_s8` with a reference min loop and updates validation call.
backends/cortex_m/ops/op_maximum.cpp	Replaces `arm_maximum_s8` with a reference max loop and updates validation call.
backends/cortex_m/ops/cortex_m_ops_common.h	Refactors validation APIs, updates formatting/printing, and replaces some C arrays with `std::array`.
backends/cortex_m/ops/cmsis_scratch_buffer_context.h	Removes `extern \"C\"` wrapper around CMSIS include.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T00:24:39Z

backends/cortex_m/ops/op_transpose.cpp

  std::array<uint32_t, kMaxSupportedDims> perm_buffer{0, 1, 2, 3};
  for (size_t i = 0; i < rank; ++i) {
    perm_buffer[i] = static_cast<uint32_t>(perm[i]);
  }


perm is only checked for length; there is no validation that it is a true permutation (all values within [0, rank) and unique). Since perm_buffer is later used as an array index, an invalid perm can cause out-of-bounds access or incorrect results; add explicit permutation validation and fail the context on invalid input.

Copilot · 2026-03-09T00:24:39Z

backends/cortex_m/ops/op_transpose.cpp

+  for (int32_t i0 = 0; i0 < output_dims_arr[0]; ++i0) {
+    for (int32_t i1 = 0; i1 < output_dims_arr[1]; ++i1) {
+      for (int32_t i2 = 0; i2 < output_dims_arr[2]; ++i2) {
+        for (int32_t i3 = 0; i3 < output_dims_arr[3]; ++i3) {
+          const std::array<int32_t, kMaxSupportedDims> out_idx{i0, i1, i2, i3};


The reference loop assumes out has the correct permuted shape; if out sizes don't match the expected permutation of input sizes, the computed in_idx/in_offset can exceed input bounds and read out-of-range. Add explicit checks that out.dim()==rank and out.size(i)==input.size(perm[i]) before entering these loops.

Copilot · 2026-03-09T00:24:40Z

backends/cortex_m/ops/cortex_m_ops_common.h

 inline void validate_quantization_params(
-    const int64_t zero_point1,
    const int64_t multiplier1,
    const int64_t shift1,
-    const int64_t zero_point2,
    const int64_t multiplier2,
    const int64_t shift2,
-    const int64_t output_zero_point,
    const int64_t output_multiplier,
-    const int64_t output_shift,
-    Tensor& output) {
-  validate_single_quant_params(
-      zero_point1, multiplier1, shift1, "Single quant Input1");
+    const int64_t output_shift) {


validate_quantization_params no longer validates zero points even though the preceding doc comment still states it does. Either update the doc comment accordingly, or re-add int8 range checks for the input/output zero points so callers still get the documented validation.

Copilot · 2026-03-09T00:24:40Z

backends/cortex_m/ops/op_quantized_transpose_conv2d.cpp

+                const int64_t w_idx =
+                    ((static_cast<int64_t>(oc) * kernel_height + kh) *
+                         kernel_width +
+                     kw) *
+                        input_channels +


w_idx uses input_channels as the innermost stride when indexing the weight tensor. This is only correct if weight.size(3) (kernel_input_channels) equals input.size(1); otherwise the weight indexing is wrong and can go out-of-bounds. Add an explicit kernel_input_channels==input_channels check and use kernel_input_channels for the weight stride.

…erence implementations (pytorch#18001) Summary: Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies. ## Context The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops. Changes: 1. **Validation API cleanup** in `cortex_m_ops_common.h`: - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params` - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements` - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size` - Added braces to bare if-statement 2. **Op file validate call updates**: Updated all callers of the simplified validation APIs 3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops 4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp` Differential Revision: D95741568

… builds (pytorch#18000) Summary: Remove redundant `extern "C"` blocks wrapping CMSIS-NN header includes and fix `-Wformat` errors in format specifiers. These changes are required for the CC pipeline's FVP benchmark runner to compile on ARM embedded targets (Cortex-M55/M85 with MVE). ## Context The `extern "C"` wrapping around CMSIS-NN headers causes ARM embedded builds targeting MVE-capable processors to fail. CMSIS-NN's `arm_nn_math_types.h` temporarily closes its inner `extern "C"` before including `arm_mve.h`, but the outer `extern "C"` from the op files remains active, forcing `arm_mve.h` into C linkage where C++ function overloading is illegal. Additionally, format specifiers (`%hhd` for `ScalarType`, `%d`/`%ld` for `int64_t`) cause `-Wformat` errors treated as build failures on ARM toolchains. Changes: 1. Removed `extern "C"` wrapping from all 13 op .cpp files and `cmsis_scratch_buffer_context.h` 2. Consolidated CMSIS-NN includes in `cortex_m_ops_common.h` — added `#include "arm_nnfunctions.h"` (without `extern "C"`) so op files get it transitively 3. Added `#include <cinttypes>` for `PRIi64` macro 4. Fixed `%hhd` → `%d` with `static_cast<int>` for `ScalarType` values 5. Fixed `%d`/`%ld` → `PRIi64` for `int64_t` values Differential Revision: D95739935

…erence implementations (pytorch#18001) Summary: Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies. ## Context The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops. Changes: 1. **Validation API cleanup** in `cortex_m_ops_common.h`: - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params` - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements` - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size` - Added braces to bare if-statement 2. **Op file validate call updates**: Updated all callers of the simplified validation APIs 3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops 4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp` Differential Revision: D95741568

Copilot AI review requested due to automatic review settings March 9, 2026 00:17

Ninja91 requested a review from rascani as a code owner March 9, 2026 00:17

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 9, 2026

Copilot started reviewing on behalf of Ninja91 March 9, 2026 00:18 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

Ninja91 force-pushed the export-D95741568 branch from adf8a7c to 6c80264 Compare March 9, 2026 19:37

Ninja91 added 2 commits March 9, 2026 12:57

Ninja91 force-pushed the export-D95741568 branch from 6c80264 to 46fbb3a Compare March 9, 2026 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001

Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001
Ninja91 wants to merge 2 commits intopytorch:mainfrom
Ninja91:export-D95741568

Ninja91 commented Mar 9, 2026

Uh oh!

pytorch-bot bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Copilot AI Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ninja91 commented Mar 9, 2026

Context

Uh oh!

pytorch-bot bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18001

❌ 4 New Failures

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

github-actions bot commented Mar 9, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 9, 2026 •

edited

Loading

This PR needs a `release notes:` label