Lowering a custom TorchAO QAT model to XNNPACK backend #20447

Lorenzo-Mazza · 2026-06-23T12:13:14Z

Lorenzo-Mazza
Jun 23, 2026

Hi,

I trained a custom PyTorch model using the most recent TorchAO eager QAT workflow, cf. qat_workflow.:

What I did so far is i) I prepared the model using Int8DynamicActivationIntxWeight (using int4) config, ii) I trained the model with a standard pytorch training loop, iii) I saved the final checkpoint as a .ckpt file, before QAT convert.
I now want to deploy this model on an ARM device through ExecuTorch with the XNNPACK backend.
My understanding is that the next logical step should be:

quantize_(model, QATConfig(base_config, step="convert"))

After that, I would like to export/lower the converted model to a .pte file for XNNPACK. This is the step I cannot find a clear example for.

Here they describe pretty much what I have done, they reach the point where they have a trained model and then run the "convert" step from torchao. What I am missing is how to go from there to a lowered model to the correct XNNPACK backend.

Here it is said that "XNNPACK backend also supports quantizing models with the torchao quantize_ API", thus what I have in mind seems achievable. But there is no concrete example on how to do the actual lowering, and the link in the file is stale and sends to a 404 page. Likely, the missing link is referring to this page here, but in this torchao tutorial there is only a partial example showing the conversion happening through some off-the-shelf scripts for a specific standard model, namely

                                                                                                                                                                                                                                                              
# Convert checkpoint format for ExecuTorch                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                              
python -m executorch.examples.models.phi_4_mini.convert_weights pytorch_model.bin pytorch_model_converted.bin
                                                                                                                              
# Export to PTE format with torchao optimizations preserved
                                                              
PARAMS="executorch/examples/models/phi_4_mini/config.json"
                                                              
python -m executorch.examples.models.llama.export_llama \
    --model "phi_4_mini" \
    --checkpoint "pytorch_model_converted.bin" \
    --params "$PARAMS" \
    -kv \
    --use_sdpa_with_kv_cache \
    -X \
    --metadata '{"get_bos_id":199999, "get_eos_ids":[200020,199999]}' \
    --max_seq_length 128 \
    --max_context_length 128 \
    --output_name="phi4-mini-8da4w.pte"

This is not directly applicable to my custom model.
So my questions are:

Is a custom non-LLM model trained with TorchAO eager QAT and Int8DynamicActivationIntxWeightConfig currently supported by ExecuTorch/XNNPACK?
After running the TorchAO QAT convert step, what is the intended export/lowering API for a custom model?
Is there a minimal reference example of: custom torch.nn.Module→ TorchAO QAT prepare→ train→ TorchAO QAT convert→ ExecuTorch/XNNPACK lowering→ .pte ?

Thanks,

Lorenzo Mazza

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lowering a custom TorchAO QAT model to XNNPACK backend #20447

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Lowering a custom TorchAO QAT model to XNNPACK backend #20447

Uh oh!

Lorenzo-Mazza Jun 23, 2026

Replies: 0 comments

Lorenzo-Mazza
Jun 23, 2026