Skip to content

ARM Ethos-U: Operators not properly quantized when cat is present #20486

Description

@etrommer

🐛 Describe the bug

When lowering this MWE for ARM Ethos-U55:

from pathlib import Path

import torch.nn as nn


class NoState1DequantPerChannelMWE(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv0 = nn.Conv2d(2, 2, kernel_size=(1, 1), bias=False)

    def forward(self, x):
        y = torch.ops.aten.hardtanh.default(self.conv0(x), 0.0, 6.0)
        y = torch.cat([x, y], dim=3)
        return y


x_cf = torch.zeros(1, 2, 1, 1)

no_state1_mwe = torch.export.export(
    NoState1DequantPerChannelMWE().eval(),
    (x_cf,),
    strict=True,
    )
no_state1_mwe_path = Path("mwe.pt2")
torch.export.save(no_state1_mwe, str(no_state1_mwe_path))

with this lowering code:

from pathlib import Path

import torch

from executorch.backends.arm.ethosu import EthosUCompileSpec, EthosUPartitioner
from executorch.backends.arm.quantizer import (
    EthosUQuantizer,
    get_symmetric_quantization_config,
)
from executorch.backends.cortex_m.passes.quantized_op_fusion_pass import (
    QuantizedOpFusionPass,
)
from executorch.backends.cortex_m.passes.replace_quant_nodes_pass import (
    ReplaceQuantNodesPass,
)
from executorch.exir import (
    EdgeCompileConfig,
    ExecutorchBackendConfig,
    to_edge_transform_and_lower,
)
from executorch.exir.passes.memory_planning_pass import MemoryPlanningPass
from executorch.extension.export_util.utils import save_pte_program
from torch.export import export
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e

COMPILE_SPEC = EthosUCompileSpec(
    "ethos-u55-128",
    config_ini=str(Path.cwd() / "my_vela.ini"),
)

MWE_INPUT = torch.randn(1, 2, 1, 1)


def export_pt2_to_pte(
    pt2_path: str | Path,
) -> dict[str, object]:
    pt2_path = Path(pt2_path)
    output_path = pt2_path.with_suffix(".helper.pte")

    exported_program = torch.export.load(str(pt2_path))
    module = exported_program.module(check_guards=False)

    quantizer = EthosUQuantizer(COMPILE_SPEC)
    quantizer.set_global(get_symmetric_quantization_config())
    prepared_model = prepare_pt2e(module, quantizer)
    prepared_model(MWE_INPUT)
    quantized_model = convert_pt2e(prepared_model)

    quantized_exported_program = export(
        quantized_model,
        (MWE_INPUT,),
        strict=True,
    )
    edge_program_manager = to_edge_transform_and_lower(
        quantized_exported_program,
        partitioner=[EthosUPartitioner(COMPILE_SPEC)],
        compile_config=EdgeCompileConfig(_check_ir_validity=False),
    )
    edge_program_manager = edge_program_manager.transform(
        [ReplaceQuantNodesPass(), QuantizedOpFusionPass()]
    )

    executorch_program = edge_program_manager.to_executorch(
        config=ExecutorchBackendConfig(
            memory_planning_pass=MemoryPlanningPass(alloc_graph_input=False),
            extract_delegate_segments=False,
        )
    )

    save_pte_program(executorch_program, str(output_path))

the output contains a convolution operator and activation that is not quantized and lowered to U55. The lowering yields a warning saying that these nodes could not be lowered to U55, because "One or more inputs were not quantized".

Note: For this to reproduce, one has to set Sram_write_latency=16 in vela.ini (default value is 32, which doesn't seem to reproduce this issue. Haven't tested other values.)
For some reason, this issue seems to disappear when using the default vela.ini and reappears when passing my_vela.ini, which is simply a copy of vela.ini in the local directory. I don't fully understand the mechanism that would cause this. If it doesn't reproduce, I would appreciate any help in narrowing down what causes this behavior.

Expected Outcome

Fully quantized graph

Actual Outcome

Conv + ReLU6 (hardtanh) are FP32 ops

Image

Versions

Collecting environment information...
PyTorch version: 2.12.0+cpu
ExecuTorch version: 1.3.1+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.4 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu224.04.1) 13.3.0
Clang version: 18.1.8 (++20240731025043+3b5b5c1ec4a3-1
exp1~20240731145144.92)
CMake version: version 4.1.2
Libc version: glibc-2.39

Python version: 3.10.15 (main, Oct 16 2024, 04:37:23) [Clang 18.1.8 ] (64-bit runtime)
Python platform: Linux-6.18.33.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani

Metadata

Metadata

Labels

module: armIssues related to arm backend

Type

No fields configured for Bug.

Projects

Status
To triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions