Skip to content

Qualcomm AI Engine Direct - AMD backend error#18098

Open
winskuo-quic wants to merge 2 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/amd_cpu_fix
Open

Qualcomm AI Engine Direct - AMD backend error#18098
winskuo-quic wants to merge 2 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/amd_cpu_fix

Conversation

@winskuo-quic
Copy link
Collaborator

@winskuo-quic winskuo-quic commented Mar 11, 2026

Summary

We noticed that when performing inference with AMD CPU, we will run into Floating point exception (core dumped).
This can be easily reproduced with following lines of code:

import torch.nn as nn
import torch
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)

Temp solution is to set mkldnn.enabled=False:
torch.backends.mkldnn.enabled = False

Test plan

NA

cc @cccclai @cbilgin

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18098

Note: Links to docs will display an error until the docs builds have been completed.

❌ 20 New Failures

As of commit 6455008 with merge base fde943a (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@winskuo-quic
Copy link
Collaborator Author

Hi @cccclai, @abhinaykukkadapu,
We have noticed that AMD CPU during AOT will run into the error: Floating point exception (core dumped). This happens during inference, including nn.Module.
There's a sample in summary section to reproduce.
This PR is a quick workaround to fix the issue, but I am assuming if this is a AMD or Torch issue, placing these logic under QNN probably isn't the best option.
Please have a look.
Thanks

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is during eager model runs?

@digantdesai digantdesai added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Mar 11, 2026
@digantdesai
Copy link
Contributor

Would you mind creating a ticket on PyTorch/PyTorch?

@winskuo-quic
Copy link
Collaborator Author

I assume this is during eager model runs?

Hi @digantdesai,
This issue happened in both Eager Model and Exported Program when we are calibrating the model.
I have created an issue ticket under pytorch/pytorch: pytorch/pytorch#177227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants