Lowering a custom TorchAO QAT model to XNNPACK backend #20447
Unanswered
Lorenzo-Mazza
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I trained a custom PyTorch model using the most recent TorchAO eager QAT workflow, cf. qat_workflow.:
What I did so far is i) I prepared the model using Int8DynamicActivationIntxWeight (using int4) config, ii) I trained the model with a standard pytorch training loop, iii) I saved the final checkpoint as a .ckpt file, before QAT convert.
I now want to deploy this model on an ARM device through ExecuTorch with the XNNPACK backend.
My understanding is that the next logical step should be:
After that, I would like to export/lower the converted model to a
.ptefile for XNNPACK. This is the step I cannot find a clear example for.Here they describe pretty much what I have done, they reach the point where they have a trained model and then run the "convert" step from torchao. What I am missing is how to go from there to a lowered model to the correct XNNPACK backend.
Here it is said that "XNNPACK backend also supports quantizing models with the torchao quantize_ API", thus what I have in mind seems achievable. But there is no concrete example on how to do the actual lowering, and the link in the file is stale and sends to a 404 page. Likely, the missing link is referring to this page here, but in this torchao tutorial there is only a partial example showing the conversion happening through some off-the-shelf scripts for a specific standard model, namely
This is not directly applicable to my custom model.
So my questions are:
Thanks,
Lorenzo Mazza
Beta Was this translation helpful? Give feedback.
All reactions