Skip to content

Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001

Open
Ninja91 wants to merge 2 commits intopytorch:mainfrom
Ninja91:export-D95741568
Open

Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001
Ninja91 wants to merge 2 commits intopytorch:mainfrom
Ninja91:export-D95741568

Conversation

@Ninja91
Copy link
Contributor

@Ninja91 Ninja91 commented Mar 9, 2026

Summary:
Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies.

Context

The previous commit removed extern "C" wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (arm_maximum_s8, arm_minimum_s8, arm_pad_s8, arm_transpose_s8, arm_transpose_conv_wrapper_s8) with equivalent C++ reference loops.

Changes:

  1. Validation API cleanup in cortex_m_ops_common.h:

    • Removed unused zero_point params from validate_single_quant_params and validate_quantization_params
    • Removed unused require_channels_last param from validate_cmsis_nn_tensor_requirements
    • Replaced C-style arrays with std::array in is_channels_last_tensor and resize_to_broadcast_target_size
    • Added braces to bare if-statement
  2. Op file validate call updates: Updated all callers of the simplified validation APIs

  3. CMSIS-NN → reference implementations: Replaced arm_maximum_s8, arm_minimum_s8, arm_pad_s8, arm_transpose_s8, and arm_transpose_conv_wrapper_s8 with reference C++ loops

  4. Dead code removal: Removed unused kChannelsLastDimOrder in op_quantized_conv2d.cpp

Differential Revision: D95741568

Copilot AI review requested due to automatic review settings March 9, 2026 00:17
@Ninja91 Ninja91 requested a review from rascani as a code owner March 9, 2026 00:17
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 9, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18001

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures

As of commit 46fbb3a with merge base c385ed0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 9, 2026

@Ninja91 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95741568.

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Cortex-M operator validation helpers and replaces several CMSIS-NN kernel calls with portable reference implementations, aiming to simplify APIs and reduce reliance on CMSIS-NN for select elementwise / layout ops.

Changes:

  • Simplifies validation helper APIs in cortex_m_ops_common.h and updates call sites accordingly.
  • Replaces CMSIS-NN implementations for transpose/pad/min/max and transpose-conv wrapper usage with C++ reference loops.
  • Removes now-unused CMSIS include blocks and dead code related to channels-last ordering.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
backends/cortex_m/ops/op_transpose.cpp Replaces arm_transpose_s8 with a reference transpose loop and removes CMSIS includes.
backends/cortex_m/ops/op_softmax.cpp Updates quant param validation call after API refactor; removes CMSIS include block.
backends/cortex_m/ops/op_quantized_transpose_conv2d.cpp Replaces CMSIS transpose-conv wrapper with a reference implementation and removes scratch-buffer usage.
backends/cortex_m/ops/op_quantized_mul.cpp Removes CMSIS include block and updates validation call signature.
backends/cortex_m/ops/op_quantized_max_pool2d.cpp Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_linear.cpp Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_depthwise_conv2d.cpp Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_conv2d.cpp Removes CMSIS include block and deletes unused channels-last dim order constant.
backends/cortex_m/ops/op_quantized_avg_pool2d.cpp Removes CMSIS include block.
backends/cortex_m/ops/op_quantized_add.cpp Removes CMSIS include block and updates validation call signature.
backends/cortex_m/ops/op_pad.cpp Replaces arm_pad_s8 with a reference padding loop and removes CMSIS include block.
backends/cortex_m/ops/op_minimum.cpp Replaces arm_minimum_s8 with a reference min loop and updates validation call.
backends/cortex_m/ops/op_maximum.cpp Replaces arm_maximum_s8 with a reference max loop and updates validation call.
backends/cortex_m/ops/cortex_m_ops_common.h Refactors validation APIs, updates formatting/printing, and replaces some C arrays with std::array.
backends/cortex_m/ops/cmsis_scratch_buffer_context.h Removes extern \"C\" wrapper around CMSIS include.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 90 to 93
std::array<uint32_t, kMaxSupportedDims> perm_buffer{0, 1, 2, 3};
for (size_t i = 0; i < rank; ++i) {
perm_buffer[i] = static_cast<uint32_t>(perm[i]);
}
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perm is only checked for length; there is no validation that it is a true permutation (all values within [0, rank) and unique). Since perm_buffer is later used as an array index, an invalid perm can cause out-of-bounds access or incorrect results; add explicit permutation validation and fail the context on invalid input.

Copilot uses AI. Check for mistakes.
Comment on lines +98 to +102
for (int32_t i0 = 0; i0 < output_dims_arr[0]; ++i0) {
for (int32_t i1 = 0; i1 < output_dims_arr[1]; ++i1) {
for (int32_t i2 = 0; i2 < output_dims_arr[2]; ++i2) {
for (int32_t i3 = 0; i3 < output_dims_arr[3]; ++i3) {
const std::array<int32_t, kMaxSupportedDims> out_idx{i0, i1, i2, i3};
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reference loop assumes out has the correct permuted shape; if out sizes don't match the expected permutation of input sizes, the computed in_idx/in_offset can exceed input bounds and read out-of-range. Add explicit checks that out.dim()==rank and out.size(i)==input.size(perm[i]) before entering these loops.

Copilot uses AI. Check for mistakes.
Comment on lines 104 to +110
inline void validate_quantization_params(
const int64_t zero_point1,
const int64_t multiplier1,
const int64_t shift1,
const int64_t zero_point2,
const int64_t multiplier2,
const int64_t shift2,
const int64_t output_zero_point,
const int64_t output_multiplier,
const int64_t output_shift,
Tensor& output) {
validate_single_quant_params(
zero_point1, multiplier1, shift1, "Single quant Input1");
const int64_t output_shift) {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_quantization_params no longer validates zero points even though the preceding doc comment still states it does. Either update the doc comment accordingly, or re-add int8 range checks for the input/output zero points so callers still get the documented validation.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +200
const int64_t w_idx =
((static_cast<int64_t>(oc) * kernel_height + kh) *
kernel_width +
kw) *
input_channels +
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w_idx uses input_channels as the innermost stride when indexing the weight tensor. This is only correct if weight.size(3) (kernel_input_channels) equals input.size(1); otherwise the weight indexing is wrong and can go out-of-bounds. Add an explicit kernel_input_channels==input_channels check and use kernel_input_channels for the weight stride.

Copilot uses AI. Check for mistakes.
Ninja91 added a commit to Ninja91/executorch that referenced this pull request Mar 9, 2026
…erence implementations (pytorch#18001)

Summary:

Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies.

## Context
The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops.

Changes:
1. **Validation API cleanup** in `cortex_m_ops_common.h`:
   - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params`
   - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements`
   - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size`
   - Added braces to bare if-statement

2. **Op file validate call updates**: Updated all callers of the simplified validation APIs

3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops

4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp`

Differential Revision: D95741568
@Ninja91 Ninja91 force-pushed the export-D95741568 branch from adf8a7c to 6c80264 Compare March 9, 2026 19:37
Ninja91 added 2 commits March 9, 2026 12:57
… builds (pytorch#18000)

Summary:

Remove redundant `extern "C"` blocks wrapping CMSIS-NN header includes and fix `-Wformat` errors in format specifiers. These changes are required for the CC pipeline's FVP benchmark runner to compile on ARM embedded targets (Cortex-M55/M85 with MVE).

## Context
The `extern "C"` wrapping around CMSIS-NN headers causes ARM embedded builds targeting MVE-capable processors to fail. CMSIS-NN's `arm_nn_math_types.h` temporarily closes its inner `extern "C"` before including `arm_mve.h`, but the outer `extern "C"` from the op files remains active, forcing `arm_mve.h` into C linkage where C++ function overloading is illegal.

Additionally, format specifiers (`%hhd` for `ScalarType`, `%d`/`%ld` for `int64_t`) cause `-Wformat` errors treated as build failures on ARM toolchains.

Changes:
1. Removed `extern "C"` wrapping from all 13 op .cpp files and `cmsis_scratch_buffer_context.h`
2. Consolidated CMSIS-NN includes in `cortex_m_ops_common.h` — added `#include "arm_nnfunctions.h"` (without `extern "C"`) so op files get it transitively
3. Added `#include <cinttypes>` for `PRIi64` macro
4. Fixed `%hhd` → `%d` with `static_cast<int>` for `ScalarType` values
5. Fixed `%d`/`%ld` → `PRIi64` for `int64_t` values

Differential Revision: D95739935
…erence implementations (pytorch#18001)

Summary:

Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies.

## Context
The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops.

Changes:
1. **Validation API cleanup** in `cortex_m_ops_common.h`:
   - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params`
   - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements`
   - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size`
   - Added braces to bare if-statement

2. **Op file validate call updates**: Updated all callers of the simplified validation APIs

3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops

4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp`

Differential Revision: D95741568
@Ninja91 Ninja91 force-pushed the export-D95741568 branch from 6c80264 to 46fbb3a Compare March 9, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants