Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK

### 🐛 Describe the bug

Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK.
Workaround was to manually compute padding for a known/given input size (e.g. padding=(8,8)).
Removing the relu also causes the sample code to function. Tested on 1.0.0 and 1.3.1.

```python
import torch
from torch.export import export

class ExecutorchTest(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = torch.nn.Sequential(
            torch.nn.Conv2d(1, 16, kernel_size=(32, 1), padding='same'),
            torch.nn.ReLU(),
        )
    def forward(self, input):
        out = torch.reshape(input, (1,1,256,1))
        out = self.conv(out)
        return out

# From: <https://docs.pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html>
model = torch.export.export(ExecutorchTest().eval(), (torch.ones(1, 256, dtype=torch.float),), strict=True).module()
sample_inputs = (torch.randn(1, 256), )

from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
    get_symmetric_quantization_config,
    XNNPACKQuantizer,
)

def quantize(model, example_inputs):
    """This is the official recommended flow for quantization in pytorch 2.0 export"""
    print(f"Original model: {model}")
    quantizer = XNNPACKQuantizer()
    # if we set is_per_channel to True, we also need to add out_variant of quantize_per_channel/dequantize_per_channel
    operator_config = get_symmetric_quantization_config(is_per_channel=False)
    quantizer.set_global(operator_config)
    m = prepare_pt2e(model, quantizer)
    # calibration
    m(*example_inputs)
    m = convert_pt2e(m)
    print(f"Quantized model: {m}")
    # make sure we can export to flat buffer
    return m

qmodel = quantize(model, sample_inputs)

from executorch.exir import EdgeCompileConfig, to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
edge = to_edge_transform_and_lower(
    export(qmodel, sample_inputs),
    compile_config=EdgeCompileConfig(_check_ir_validity=False),
    partitioner=[XnnpackPartitioner()]
)

exec_prog = edge.to_executorch()

from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
et_model = _load_for_executorch_from_buffer(exec_prog.buffer)
et_model(sample_inputs)  # <-- fails
```

Log from debug build (git ref a11d555):
```
Error in XNNPACK: unsupported operator clamp for datatypes FP32 -> QINT8 (init_op, /executorch/backends/xnnpack/third-party/XNNPACK/src/operators/unary-elementwise-nc.c:300)
Error in XNNPACK: failed to create node 0 (create_runtime_impl, /executorch/backends/xnnpack/third-party/XNNPACK/src/runtime.c:643)
[XNNCompiler.cpp:1922] XNN Runtime creation failed with code: xnn_status_unsupported_parameter
[XNNPACKBackend.cpp:121] XNNCompiler::compileModel failed: 0x1
[method.cpp:114] Init failed for backend XnnpackBackend: 0x1
Traceback (most recent call last):
  File "/executorch/example.py", line 55, in <module>
    et_model(sample_inputs)  # <-- fails
    ^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Failed to execute method forward, error: 0x1
```

### Versions

```
Python version: 3.12.11 (main, Jun  3 2025, 15:41:47) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime)
Python platform: macOS-26.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M4 Max

Versions of relevant libraries:
[pip3] executorch==1.0.0
[pip3] numpy==2.3.4
[pip3] torch==2.9.0
[pip3] torchao==0.14.1
[conda] Could not collect
```
Also tested with the latest release `executorch==1.3.1` and `torch==2.12.1`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK #20517

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Conv2d with padding 'same' fails (when quantised) at runtime on XNNPACK #20517

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions