[Pytorch] Add get_backward_dw_params api for TE module #2614

Wohox · 2026-01-22T09:12:39Z

Description

This PR adds get_backward_dw_params for TE modules, which helps manage the hooks of parameters.

For Megatron-LM, get_backward_dw_params will be called once the wgrad cuda graph is executed. Currently the backward_post_hook of wgrad computation is discarded and will cause parameters to skip grad reduce.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

greptile-apps · 2026-01-22T09:18:30Z

Greptile Overview

Greptile Summary

This PR fixes a bug where weight gradient computation hooks were being discarded when CUDA graphs are used with Megatron-LM, causing parameters to skip gradient reduction.

Changes:

Extracted hook triggering into _trigger_wgrad_accumulation_and_reduce_hooks() helper method in base.py for better code organization and reusability
Refactored make_graphed_attribute_functions() in graph.py to retrieve te_modules from closure scope instead of requiring it as a parameter, simplifying the call site
The hooks are now properly triggered after wgrad graph replay, ensuring gradient accumulation and reduction occur correctly

The implementation follows the same pattern that was previously reverted (commit d04c008), but with a cleaner function signature that leverages closure scope access to visited_te_modules.

Confidence Score: 5/5

This PR is safe to merge - it's a focused bug fix with clean refactoring
The changes are minimal, well-scoped, and address a real bug that was preventing hooks from being triggered. The refactoring improves code organization by extracting the hook triggering logic into a reusable helper method and simplifying the function signature by leveraging closure scope.
No files require special attention

Important Files Changed

Filename	Overview
transformer_engine/pytorch/graph.py	Refactored `make_graphed_attribute_functions` to retrieve `te_modules` from closure scope instead of passing as parameter, simplifying the function signature
transformer_engine/pytorch/module/base.py	Extracted hook triggering logic into `_trigger_wgrad_accumulation_and_reduce_hooks` helper method for reusability

Sequence Diagram

sequenceDiagram
    participant MegatronLM
    participant GraphedCallable
    participant backward_dw
    participant TEModule
    participant Hooks

    Note over MegatronLM,Hooks: Wgrad CUDA Graph Execution Flow

    MegatronLM->>GraphedCallable: call backward_dw()
    GraphedCallable->>backward_dw: execute backward_dw()
    
    alt need_bwd_dw_graph[graph_idx] == True
        backward_dw->>backward_dw: replay wgrad graph
        
        loop for each module in te_modules
            backward_dw->>TEModule: check need_backward_dw()
            
            alt module needs backward_dw
                backward_dw->>TEModule: _trigger_wgrad_accumulation_and_reduce_hooks()
                
                loop for each hook in wgrad_accumulation_and_reduce_hooks
                    TEModule->>Hooks: execute hook()
                    Note over Hooks: Performs grad accumulation<br/>and reduction
                end
            end
        end
    end

greptile-apps · 2026-01-22T09:18:31Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

Wohox · 2026-01-22T09:23:40Z

@buptzyb @lhb8125 Please help review this PR, thanks!

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-23T06:49:57Z

transformer_engine/pytorch/module/base.py

+        Get the parameters for the backward weight gradient computation.
+        """
+        params = []
+        params.append(noop_cat(self._get_weight_tensors()))


logic: in backward_dw() (line 1520-1522), weight tensors are only accessed when not self.fuse_wgrad_accumulation, but this method unconditionally returns weight parameters. depending on Megatron-LM's usage, this could cause hooks to be registered on parameters that shouldn't have them when fuse_wgrad_accumulation=True

commit content reverted.

…ution

for more information, see https://pre-commit.ci

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Wohox · 2026-01-30T01:50:22Z

@ksivaman Can you help review this PR, it's a bug fix for #2376.

greptile-apps

_{2 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

add get_backward_dw_params

150a966

Wohox mentioned this pull request Jan 22, 2026

[Dev] fix cg missing wgrad hook NVIDIA/Megatron-LM#2999

Merged

6 tasks

Merge branch 'main' into pingtian/add_linear_wgrad_compute_param_api

6dbf9b2

greptile-apps bot reviewed Jan 23, 2026

View reviewed changes

Wohox and others added 3 commits January 25, 2026 18:49

revert get_backward_dw_params and trigger hook after wgrad graph exec…

d04c008

…ution

[pre-commit.ci] auto fixes from pre-commit.com hooks

9894713

for more information, see https://pre-commit.ci

Merge branch 'main' into pingtian/add_linear_wgrad_compute_param_api

4883224

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

Wohox mentioned this pull request Jan 26, 2026

[Main] fix cg missing wgrad hook NVIDIA/Megatron-LM#3074

Open

6 tasks

simpler api

52985d3

greptile-apps bot reviewed Jan 26, 2026

View reviewed changes

Merge branch 'main' into pingtian/add_linear_wgrad_compute_param_api

dbafbe3

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

Merge branch 'main' into pingtian/add_linear_wgrad_compute_param_api

fbae781

greptile-apps bot reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pytorch] Add get_backward_dw_params api for TE module #2614

[Pytorch] Add get_backward_dw_params api for TE module #2614

Uh oh!

Wohox commented Jan 22, 2026

Uh oh!

greptile-apps bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 22, 2026

Uh oh!

Wohox commented Jan 22, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 23, 2026

Uh oh!

Wohox Jan 26, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot left a comment

Uh oh!

Wohox commented Jan 30, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Pytorch] Add get_backward_dw_params api for TE module #2614

Are you sure you want to change the base?

[Pytorch] Add get_backward_dw_params api for TE module #2614

Uh oh!

Conversation

Wohox commented Jan 22, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot commented Jan 22, 2026

Greptile's behavior is changing!

Uh oh!

Wohox commented Jan 22, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Wohox Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Wohox commented Jan 30, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Jan 22, 2026 •

edited

Loading