Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001
Refactor operator validation APIs and replace CMSIS-NN calls with reference implementations#18001Ninja91 wants to merge 2 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18001
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New FailuresAs of commit 46fbb3a with merge base c385ed0 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR refactors Cortex-M operator validation helpers and replaces several CMSIS-NN kernel calls with portable reference implementations, aiming to simplify APIs and reduce reliance on CMSIS-NN for select elementwise / layout ops.
Changes:
- Simplifies validation helper APIs in
cortex_m_ops_common.hand updates call sites accordingly. - Replaces CMSIS-NN implementations for transpose/pad/min/max and transpose-conv wrapper usage with C++ reference loops.
- Removes now-unused CMSIS include blocks and dead code related to channels-last ordering.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| backends/cortex_m/ops/op_transpose.cpp | Replaces arm_transpose_s8 with a reference transpose loop and removes CMSIS includes. |
| backends/cortex_m/ops/op_softmax.cpp | Updates quant param validation call after API refactor; removes CMSIS include block. |
| backends/cortex_m/ops/op_quantized_transpose_conv2d.cpp | Replaces CMSIS transpose-conv wrapper with a reference implementation and removes scratch-buffer usage. |
| backends/cortex_m/ops/op_quantized_mul.cpp | Removes CMSIS include block and updates validation call signature. |
| backends/cortex_m/ops/op_quantized_max_pool2d.cpp | Removes CMSIS include block. |
| backends/cortex_m/ops/op_quantized_linear.cpp | Removes CMSIS include block. |
| backends/cortex_m/ops/op_quantized_depthwise_conv2d.cpp | Removes CMSIS include block. |
| backends/cortex_m/ops/op_quantized_conv2d.cpp | Removes CMSIS include block and deletes unused channels-last dim order constant. |
| backends/cortex_m/ops/op_quantized_avg_pool2d.cpp | Removes CMSIS include block. |
| backends/cortex_m/ops/op_quantized_add.cpp | Removes CMSIS include block and updates validation call signature. |
| backends/cortex_m/ops/op_pad.cpp | Replaces arm_pad_s8 with a reference padding loop and removes CMSIS include block. |
| backends/cortex_m/ops/op_minimum.cpp | Replaces arm_minimum_s8 with a reference min loop and updates validation call. |
| backends/cortex_m/ops/op_maximum.cpp | Replaces arm_maximum_s8 with a reference max loop and updates validation call. |
| backends/cortex_m/ops/cortex_m_ops_common.h | Refactors validation APIs, updates formatting/printing, and replaces some C arrays with std::array. |
| backends/cortex_m/ops/cmsis_scratch_buffer_context.h | Removes extern \"C\" wrapper around CMSIS include. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::array<uint32_t, kMaxSupportedDims> perm_buffer{0, 1, 2, 3}; | ||
| for (size_t i = 0; i < rank; ++i) { | ||
| perm_buffer[i] = static_cast<uint32_t>(perm[i]); | ||
| } |
There was a problem hiding this comment.
perm is only checked for length; there is no validation that it is a true permutation (all values within [0, rank) and unique). Since perm_buffer is later used as an array index, an invalid perm can cause out-of-bounds access or incorrect results; add explicit permutation validation and fail the context on invalid input.
| for (int32_t i0 = 0; i0 < output_dims_arr[0]; ++i0) { | ||
| for (int32_t i1 = 0; i1 < output_dims_arr[1]; ++i1) { | ||
| for (int32_t i2 = 0; i2 < output_dims_arr[2]; ++i2) { | ||
| for (int32_t i3 = 0; i3 < output_dims_arr[3]; ++i3) { | ||
| const std::array<int32_t, kMaxSupportedDims> out_idx{i0, i1, i2, i3}; |
There was a problem hiding this comment.
The reference loop assumes out has the correct permuted shape; if out sizes don't match the expected permutation of input sizes, the computed in_idx/in_offset can exceed input bounds and read out-of-range. Add explicit checks that out.dim()==rank and out.size(i)==input.size(perm[i]) before entering these loops.
| inline void validate_quantization_params( | ||
| const int64_t zero_point1, | ||
| const int64_t multiplier1, | ||
| const int64_t shift1, | ||
| const int64_t zero_point2, | ||
| const int64_t multiplier2, | ||
| const int64_t shift2, | ||
| const int64_t output_zero_point, | ||
| const int64_t output_multiplier, | ||
| const int64_t output_shift, | ||
| Tensor& output) { | ||
| validate_single_quant_params( | ||
| zero_point1, multiplier1, shift1, "Single quant Input1"); | ||
| const int64_t output_shift) { |
There was a problem hiding this comment.
validate_quantization_params no longer validates zero points even though the preceding doc comment still states it does. Either update the doc comment accordingly, or re-add int8 range checks for the input/output zero points so callers still get the documented validation.
| const int64_t w_idx = | ||
| ((static_cast<int64_t>(oc) * kernel_height + kh) * | ||
| kernel_width + | ||
| kw) * | ||
| input_channels + |
There was a problem hiding this comment.
w_idx uses input_channels as the innermost stride when indexing the weight tensor. This is only correct if weight.size(3) (kernel_input_channels) equals input.size(1); otherwise the weight indexing is wrong and can go out-of-bounds. Add an explicit kernel_input_channels==input_channels check and use kernel_input_channels for the weight stride.
…erence implementations (pytorch#18001) Summary: Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies. ## Context The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops. Changes: 1. **Validation API cleanup** in `cortex_m_ops_common.h`: - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params` - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements` - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size` - Added braces to bare if-statement 2. **Op file validate call updates**: Updated all callers of the simplified validation APIs 3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops 4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp` Differential Revision: D95741568
adf8a7c to
6c80264
Compare
… builds (pytorch#18000) Summary: Remove redundant `extern "C"` blocks wrapping CMSIS-NN header includes and fix `-Wformat` errors in format specifiers. These changes are required for the CC pipeline's FVP benchmark runner to compile on ARM embedded targets (Cortex-M55/M85 with MVE). ## Context The `extern "C"` wrapping around CMSIS-NN headers causes ARM embedded builds targeting MVE-capable processors to fail. CMSIS-NN's `arm_nn_math_types.h` temporarily closes its inner `extern "C"` before including `arm_mve.h`, but the outer `extern "C"` from the op files remains active, forcing `arm_mve.h` into C linkage where C++ function overloading is illegal. Additionally, format specifiers (`%hhd` for `ScalarType`, `%d`/`%ld` for `int64_t`) cause `-Wformat` errors treated as build failures on ARM toolchains. Changes: 1. Removed `extern "C"` wrapping from all 13 op .cpp files and `cmsis_scratch_buffer_context.h` 2. Consolidated CMSIS-NN includes in `cortex_m_ops_common.h` — added `#include "arm_nnfunctions.h"` (without `extern "C"`) so op files get it transitively 3. Added `#include <cinttypes>` for `PRIi64` macro 4. Fixed `%hhd` → `%d` with `static_cast<int>` for `ScalarType` values 5. Fixed `%d`/`%ld` → `PRIi64` for `int64_t` values Differential Revision: D95739935
…erence implementations (pytorch#18001) Summary: Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies. ## Context The previous commit removed `extern "C"` wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (`arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, `arm_transpose_conv_wrapper_s8`) with equivalent C++ reference loops. Changes: 1. **Validation API cleanup** in `cortex_m_ops_common.h`: - Removed unused `zero_point` params from `validate_single_quant_params` and `validate_quantization_params` - Removed unused `require_channels_last` param from `validate_cmsis_nn_tensor_requirements` - Replaced C-style arrays with `std::array` in `is_channels_last_tensor` and `resize_to_broadcast_target_size` - Added braces to bare if-statement 2. **Op file validate call updates**: Updated all callers of the simplified validation APIs 3. **CMSIS-NN → reference implementations**: Replaced `arm_maximum_s8`, `arm_minimum_s8`, `arm_pad_s8`, `arm_transpose_s8`, and `arm_transpose_conv_wrapper_s8` with reference C++ loops 4. **Dead code removal**: Removed unused `kChannelsLastDimOrder` in `op_quantized_conv2d.cpp` Differential Revision: D95741568
6c80264 to
46fbb3a
Compare
Summary:
Clean up Cortex-M operator validation APIs and replace several CMSIS-NN function calls with portable reference implementations to reduce external dependencies.
Context
The previous commit removed
extern "C"wrapping around CMSIS-NN headers. This follow-up refactors the validation APIs and replaces CMSIS-NN elementwise ops (arm_maximum_s8,arm_minimum_s8,arm_pad_s8,arm_transpose_s8,arm_transpose_conv_wrapper_s8) with equivalent C++ reference loops.Changes:
Validation API cleanup in
cortex_m_ops_common.h:zero_pointparams fromvalidate_single_quant_paramsandvalidate_quantization_paramsrequire_channels_lastparam fromvalidate_cmsis_nn_tensor_requirementsstd::arrayinis_channels_last_tensorandresize_to_broadcast_target_sizeOp file validate call updates: Updated all callers of the simplified validation APIs
CMSIS-NN → reference implementations: Replaced
arm_maximum_s8,arm_minimum_s8,arm_pad_s8,arm_transpose_s8, andarm_transpose_conv_wrapper_s8with reference C++ loopsDead code removal: Removed unused
kChannelsLastDimOrderinop_quantized_conv2d.cppDifferential Revision: D95741568