🐛 Describe the bug
Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK.
Workaround was to manually compute padding for a known/given input size (e.g. padding=(8,8)).
Removing the relu also causes the sample code to function. Tested on 1.0.0 and 1.3.1.
import torch
from torch.export import export
class ExecutorchTest(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv = torch.nn.Sequential(
torch.nn.Conv2d(1, 16, kernel_size=(32, 1), padding='same'),
torch.nn.ReLU(),
)
def forward(self, input):
out = torch.reshape(input, (1,1,256,1))
out = self.conv(out)
return out
# From: <https://docs.pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html>
model = torch.export.export(ExecutorchTest().eval(), (torch.ones(1, 256, dtype=torch.float),), strict=True).module()
sample_inputs = (torch.randn(1, 256), )
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
get_symmetric_quantization_config,
XNNPACKQuantizer,
)
def quantize(model, example_inputs):
"""This is the official recommended flow for quantization in pytorch 2.0 export"""
print(f"Original model: {model}")
quantizer = XNNPACKQuantizer()
# if we set is_per_channel to True, we also need to add out_variant of quantize_per_channel/dequantize_per_channel
operator_config = get_symmetric_quantization_config(is_per_channel=False)
quantizer.set_global(operator_config)
m = prepare_pt2e(model, quantizer)
# calibration
m(*example_inputs)
m = convert_pt2e(m)
print(f"Quantized model: {m}")
# make sure we can export to flat buffer
return m
qmodel = quantize(model, sample_inputs)
from executorch.exir import EdgeCompileConfig, to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
edge = to_edge_transform_and_lower(
export(qmodel, sample_inputs),
compile_config=EdgeCompileConfig(_check_ir_validity=False),
partitioner=[XnnpackPartitioner()]
)
exec_prog = edge.to_executorch()
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
et_model = _load_for_executorch_from_buffer(exec_prog.buffer)
et_model(sample_inputs) # <-- fails
Log from debug build (git ref a11d555):
Error in XNNPACK: unsupported operator clamp for datatypes FP32 -> QINT8 (init_op, /executorch/backends/xnnpack/third-party/XNNPACK/src/operators/unary-elementwise-nc.c:300)
Error in XNNPACK: failed to create node 0 (create_runtime_impl, /executorch/backends/xnnpack/third-party/XNNPACK/src/runtime.c:643)
[XNNCompiler.cpp:1922] XNN Runtime creation failed with code: xnn_status_unsupported_parameter
[XNNPACKBackend.cpp:121] XNNCompiler::compileModel failed: 0x1
[method.cpp:114] Init failed for backend XnnpackBackend: 0x1
Traceback (most recent call last):
File "/executorch/example.py", line 55, in <module>
et_model(sample_inputs) # <-- fails
^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to execute method forward, error: 0x1
Versions
Python version: 3.12.11 (main, Jun 3 2025, 15:41:47) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime)
Python platform: macOS-26.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4 Max
Versions of relevant libraries:
[pip3] executorch==1.0.0
[pip3] numpy==2.3.4
[pip3] torch==2.9.0
[pip3] torchao==0.14.1
[conda] Could not collect
Also tested with the latest release executorch==1.3.1 and torch==2.12.1
🐛 Describe the bug
Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK.
Workaround was to manually compute padding for a known/given input size (e.g. padding=(8,8)).
Removing the relu also causes the sample code to function. Tested on 1.0.0 and 1.3.1.
Log from debug build (git ref a11d555):
Versions
Also tested with the latest release
executorch==1.3.1andtorch==2.12.1