fix: scope get_full_cu_seqlens cache key by device and inference mode by DmCarpe93 · Pull Request #2728 · NVIDIA/TransformerEngine

DmCarpe93 · 2026-03-03T09:09:39Z

Description

Fixed an issue where the cu_seqlen tensor was incorrectly retrieved from the cache.

Currently, only (batch_size, max_seqlen) were used as the cache key when retrieving cu_seqlens.
This coud result in error especially for Knowledge Distillation training, because teacher and student model can be run on same node.
- When teacher model run first, cu_seqlens tensor would be created and cached.
- After that, when student model trains on the same node, the cached cu_seqlens tensor would be used if same (batch_size, max_seqlen) is used.
- Since cached cu_seqlens tensor from teacher model could have different inference mode and device, it could result in error.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

The cache key for retrieving cu_seqlens was updated from (batch_size, max_seqlen) to include both the device and inference mode.
Added testcases for cu_seqlens cache.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-03-03T09:15:34Z

Greptile Summary

This PR fixes a cache-key collision bug in get_full_cu_seqlens where only (batch_size, max_seqlen) was used as the lookup key, allowing tensors created on a different device or inside torch.inference_mode() to be incorrectly returned for a different execution context. The fix extends the cache key to (batch_size, max_seqlen, device, is_inference), which is the minimal correct set of dimensions that determine whether a cached tensor is safe to reuse.

Key changes:

utils.py: torch.is_inference_mode_enabled() is captured at call time and appended to the cache key alongside the tensor's device, ensuring inference-mode tensors (which carry the inference flag and are incompatible with autograd) are never served to a training-mode caller, and that tensors on cuda:0 are never served to a caller targeting cuda:1.
test_cu_seqlens_cache.py: Two new white-box tests verify device-level isolation and inference-vs-training isolation of the cache. The autouse fixture clears the cache before and after each test to prevent cross-test pollution.
The multi-device test (test_cu_seqlens_cache_isolated_across_devices_for_forward) runs both models under torch.no_grad() rather than torch.inference_mode(), so is_inference=False for both entries. The cross-device-plus-inference-mode combination (e.g. teacher on cuda:0 in inference mode, student on cuda:1 in training mode) is not explicitly exercised, though the two individual dimensions are each covered by a separate test.

Confidence Score: 5/5

This PR is safe to merge — it is a targeted bug fix with no breaking API changes and has accompanying tests.
The change is minimal and surgical (5 lines modified in the production code path). The new composite cache key correctly captures every dimension that affects tensor reusability. All existing callers pass tensor.device (a torch.device object), which is consistent with the test assertions. No existing behavior is broken for single-device, non-KD workloads since the cache still hits for repeated calls with identical parameters.
No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/attention/dot_product_attention/utils.py	Cache key for `get_full_cu_seqlens` extended from `(batch_size, max_seqlen)` to `(batch_size, max_seqlen, device, is_inference)`, correctly preventing cross-device and cross-inference-mode cache collisions.
tests/pytorch/attention/test_cu_seqlens_cache.py	New test file validating device isolation and inference/training mode isolation of the cu_seqlens cache; the multi-device test uses `torch.no_grad()` (not `torch.inference_mode()`), so the cross-device-plus-inference-mode combination is not explicitly covered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_full_cu_seqlens called] --> B{ONNX export mode?}
    B -- Yes --> C[Compute and return without caching]
    B -- No --> D[Capture is_inference_mode_enabled]
    D --> E[Compose cache tuple: batch + seqlen + device + is_inference]
    E --> F{Tuple in cache dict?}
    F -- No --> G[Compute new tensor and store]
    G --> H[Return tensor]
    F -- Yes --> H[Return cached tensor]

_{Last reviewed commit: 60d491e}

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

DmCarpe93 · 2026-03-11T01:50:47Z

@cyanguwa When you have a moment, could you please take a look at this PR? Thanks:)

DmCarpe93 and others added 2 commits February 27, 2026 16:27

fix: scope get_full_cu_seqlens cache key by device and inference mode

c91cd35

Signed-off-by: Dongmin Ra <dongmin.ra@navercorp.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

02fbe60

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Mar 3, 2026

View reviewed changes

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

86151e8

ptrendx requested a review from cyanguwa March 3, 2026 18:54

Merge branch 'main' into fix/get_full_cu_seqlens_cache_key_error

60d491e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: scope get_full_cu_seqlens cache key by device and inference mode#2728

fix: scope get_full_cu_seqlens cache key by device and inference mode#2728
DmCarpe93 wants to merge 4 commits intoNVIDIA:mainfrom
DmCarpe93:fix/get_full_cu_seqlens_cache_key_error

DmCarpe93 commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

DmCarpe93 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DmCarpe93 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

DmCarpe93 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DmCarpe93 commented Mar 3, 2026 •

edited

Loading

greptile-apps bot commented Mar 3, 2026 •

edited

Loading