[PyTorch] ONNX test fix + export for FP8 attention #2598

pggPL · 2026-01-14T19:20:22Z

Description

Fixes incorrect implementation of no_torch_dynamo decorator, which results in errors for newest PyTorch. The decorator was not correctly disabled during export to onnx.
Adds support for FP8 attention export.

Fixes #2588

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-01-27T17:12:56Z

Greptile Overview

Greptile Summary

Fixed ONNX export for FP8 attention by addressing two key issues:

Critical Bug Fix: The no_torch_dynamo decorator was incorrectly checking ONNX export mode at decorator definition time rather than at runtime. The old lambda-based implementation evaluated is_in_onnx_export_mode() when the decorator was applied, not when the function was called. This caused torch._dynamo to be disabled during ONNX export in newer PyTorch versions, breaking the export process. The new implementation properly wraps the function and checks export mode at runtime, allowing Dynamo to remain enabled during export.

FP8 Attention Export Support: Added ONNX export capability for FP8 attention by:

Implementing onnx_forward() method in FP8EmulationFunc that uses ONNX-compatible quantization operations
Enabling FP8 emulation during ONNX export mode in attention backend selection logic
Adding comprehensive test coverage with FP8 recipes (DelayedScaling and Float8CurrentScaling)
Setting NVTE_UnfusedDPA_Emulate_FP8=1 in test script to enable emulation when native backend unavailable

Test Changes:

Added parameterization for FP8 recipes in test_export_core_attention
Removed attention_dropout=0.5 parameter (ensures deterministic ONNX export)
Increased tolerance for FP8 tests (atol=1.5e-1) due to quantization approximation

The implementation correctly separates ONNX export path from the training/inference path, using the quantizer's onnx_quantize/onnx_dequantize methods for export compatibility.

Confidence Score: 4/5

This PR is safe to merge with minor verification recommended
The changes are well-structured and address real bugs. The no_torch_dynamo fix is a critical correctness improvement that properly handles ONNX export mode. The FP8 attention export implementation follows established patterns and includes proper test coverage. Score is 4 (not 5) due to one minor concern: removal of attention_dropout=0.5 in tests should be verified as intentional.
tests/pytorch/test_onnx_export.py - verify dropout parameter removal was intentional

Important Files Changed

Filename	Overview
transformer_engine/pytorch/jit.py	Fixed `no_torch_dynamo` decorator to properly work during ONNX export by checking export mode at runtime instead of decorator definition time
transformer_engine/pytorch/attention/dot_product_attention/backends.py	Added ONNX-compatible forward path for FP8 emulation that uses quantizer's ONNX methods for export
tests/pytorch/test_onnx_export.py	Added test parameterization for FP8 attention export with different FP8 recipes and adjusted tolerance for FP8 tests

Sequence Diagram

sequenceDiagram
    participant User
    participant DotProductAttention
    participant no_torch_dynamo
    participant FP8EmulationFunc
    participant Quantizer
    participant ONNX

    User->>DotProductAttention: forward() with FP8 recipe
    DotProductAttention->>no_torch_dynamo: Check if in ONNX export mode
    
    alt ONNX Export Mode
        no_torch_dynamo->>DotProductAttention: Return original function (Dynamo enabled)
        DotProductAttention->>FP8EmulationFunc: forward(Q, K, V, quantizer)
        FP8EmulationFunc->>FP8EmulationFunc: Detect ONNX export mode
        FP8EmulationFunc->>FP8EmulationFunc: onnx_forward()
        FP8EmulationFunc->>Quantizer: onnx_quantize(tensor)
        Quantizer-->>FP8EmulationFunc: FP8 tensor
        FP8EmulationFunc->>Quantizer: onnx_dequantize(fp8_tensor)
        Quantizer-->>FP8EmulationFunc: Dequantized tensor
        FP8EmulationFunc-->>DotProductAttention: Q', K', V' (emulated FP8)
        DotProductAttention->>ONNX: Export graph with ONNX-compatible ops
        ONNX-->>User: Exported ONNX model
    else Normal Training/Inference
        no_torch_dynamo->>DotProductAttention: Return disabled function (Dynamo disabled)
        DotProductAttention->>FP8EmulationFunc: forward(Q, K, V, quantizer)
        FP8EmulationFunc->>FP8EmulationFunc: Regular forward path
        FP8EmulationFunc->>Quantizer: quantize(tensor) via combine_and_quantize
        Quantizer-->>FP8EmulationFunc: FP8 tensor
        FP8EmulationFunc->>Quantizer: dequantize(fp8_tensor) via combine_and_dequantize
        Quantizer-->>FP8EmulationFunc: Dequantized tensor
        FP8EmulationFunc-->>DotProductAttention: Q', K', V' (emulated FP8)
        DotProductAttention-->>User: Attention output
    end

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

tests/pytorch/test_onnx_export.py

pggPL · 2026-01-27T17:51:51Z

/te-ci pytorch L1

pggPL and others added 7 commits January 14, 2026 19:46

jjit bug fix

2c49133

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix'

bec2c3c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

0e054ad

for more information, see https://pre-commit.ci

fix

4bc878c

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fix

75ca174

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

fixes

1f0111f

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

768796b

for more information, see https://pre-commit.ci

pggPL marked this pull request as ready for review January 27, 2026 17:09

Merge branch 'main' into onnx_debug

87c5101

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

pggPL and others added 2 commits January 27, 2026 17:39

lint fixes

6f04da2

Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3a1acf

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

tests/pytorch/test_onnx_export.py Show resolved Hide resolved

KshitijLakhani requested review from cyanguwa and timmoon10 January 27, 2026 18:37

KshitijLakhani added the 2.12.0 label Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] ONNX test fix + export for FP8 attention #2598

[PyTorch] ONNX test fix + export for FP8 attention #2598

pggPL commented Jan 14, 2026

Uh oh!

greptile-apps bot commented Jan 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[PyTorch] ONNX test fix + export for FP8 attention #2598

Are you sure you want to change the base?

[PyTorch] ONNX test fix + export for FP8 attention #2598

Conversation

pggPL commented Jan 14, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pggPL commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Jan 27, 2026 •

edited

Loading