🐛 Describe the bug
When lowering this MWE for ARM Ethos-U55:
from pathlib import Path
import torch.nn as nn
class NoState1DequantPerChannelMWE(nn.Module):
def __init__(self):
super().__init__()
self.conv0 = nn.Conv2d(2, 2, kernel_size=(1, 1), bias=False)
def forward(self, x):
y = torch.ops.aten.hardtanh.default(self.conv0(x), 0.0, 6.0)
y = torch.cat([x, y], dim=3)
return y
x_cf = torch.zeros(1, 2, 1, 1)
no_state1_mwe = torch.export.export(
NoState1DequantPerChannelMWE().eval(),
(x_cf,),
strict=True,
)
no_state1_mwe_path = Path("mwe.pt2")
torch.export.save(no_state1_mwe, str(no_state1_mwe_path))
with this lowering code:
from pathlib import Path
import torch
from executorch.backends.arm.ethosu import EthosUCompileSpec, EthosUPartitioner
from executorch.backends.arm.quantizer import (
EthosUQuantizer,
get_symmetric_quantization_config,
)
from executorch.backends.cortex_m.passes.quantized_op_fusion_pass import (
QuantizedOpFusionPass,
)
from executorch.backends.cortex_m.passes.replace_quant_nodes_pass import (
ReplaceQuantNodesPass,
)
from executorch.exir import (
EdgeCompileConfig,
ExecutorchBackendConfig,
to_edge_transform_and_lower,
)
from executorch.exir.passes.memory_planning_pass import MemoryPlanningPass
from executorch.extension.export_util.utils import save_pte_program
from torch.export import export
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
COMPILE_SPEC = EthosUCompileSpec(
"ethos-u55-128",
config_ini=str(Path.cwd() / "my_vela.ini"),
)
MWE_INPUT = torch.randn(1, 2, 1, 1)
def export_pt2_to_pte(
pt2_path: str | Path,
) -> dict[str, object]:
pt2_path = Path(pt2_path)
output_path = pt2_path.with_suffix(".helper.pte")
exported_program = torch.export.load(str(pt2_path))
module = exported_program.module(check_guards=False)
quantizer = EthosUQuantizer(COMPILE_SPEC)
quantizer.set_global(get_symmetric_quantization_config())
prepared_model = prepare_pt2e(module, quantizer)
prepared_model(MWE_INPUT)
quantized_model = convert_pt2e(prepared_model)
quantized_exported_program = export(
quantized_model,
(MWE_INPUT,),
strict=True,
)
edge_program_manager = to_edge_transform_and_lower(
quantized_exported_program,
partitioner=[EthosUPartitioner(COMPILE_SPEC)],
compile_config=EdgeCompileConfig(_check_ir_validity=False),
)
edge_program_manager = edge_program_manager.transform(
[ReplaceQuantNodesPass(), QuantizedOpFusionPass()]
)
executorch_program = edge_program_manager.to_executorch(
config=ExecutorchBackendConfig(
memory_planning_pass=MemoryPlanningPass(alloc_graph_input=False),
extract_delegate_segments=False,
)
)
save_pte_program(executorch_program, str(output_path))
the output contains a convolution operator and activation that is not quantized and lowered to U55. The lowering yields a warning saying that these nodes could not be lowered to U55, because "One or more inputs were not quantized".
Note: For this to reproduce, one has to set Sram_write_latency=16 in vela.ini (default value is 32, which doesn't seem to reproduce this issue. Haven't tested other values.)
For some reason, this issue seems to disappear when using the default vela.ini and reappears when passing my_vela.ini, which is simply a copy of vela.ini in the local directory. I don't fully understand the mechanism that would cause this. If it doesn't reproduce, I would appreciate any help in narrowing down what causes this behavior.
Expected Outcome
Fully quantized graph
Actual Outcome
Conv + ReLU6 (hardtanh) are FP32 ops
Versions
Collecting environment information...
PyTorch version: 2.12.0+cpu
ExecuTorch version: 1.3.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu224.04.1) 13.3.0
Clang version: 18.1.8 (++20240731025043+3b5b5c1ec4a3-1exp1~20240731145144.92)
CMake version: version 4.1.2
Libc version: glibc-2.39
Python version: 3.10.15 (main, Oct 16 2024, 04:37:23) [Clang 18.1.8 ] (64-bit runtime)
Python platform: Linux-6.18.33.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani
🐛 Describe the bug
When lowering this MWE for ARM Ethos-U55:
with this lowering code:
the output contains a convolution operator and activation that is not quantized and lowered to U55. The lowering yields a warning saying that these nodes could not be lowered to U55, because "One or more inputs were not quantized".
Note: For this to reproduce, one has to setSram_write_latency=16invela.ini(default value is 32, which doesn't seem to reproduce this issue. Haven't tested other values.)For some reason, this issue seems to disappear when using the default vela.ini and reappears when passing
my_vela.ini, which is simply a copy ofvela.iniin the local directory. I don't fully understand the mechanism that would cause this. If it doesn't reproduce, I would appreciate any help in narrowing down what causes this behavior.Expected Outcome
Fully quantized graph
Actual Outcome
Conv + ReLU6 (hardtanh) are FP32 ops
Versions
Collecting environment information...
PyTorch version: 2.12.0+cpu
ExecuTorch version: 1.3.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2
24.04.1) 13.3.0exp1~20240731145144.92)Clang version: 18.1.8 (++20240731025043+3b5b5c1ec4a3-1
CMake version: version 4.1.2
Libc version: glibc-2.39
Python version: 3.10.15 (main, Oct 16 2024, 04:37:23) [Clang 18.1.8 ] (64-bit runtime)
Python platform: Linux-6.18.33.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani