Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Improve TE Group MLP CPU Overhead
#2991 opened May 14, 2026 by zhongbozhu Collaborator Draft
13 tasks
Add codex/agents to .gitignore
#2990 opened May 14, 2026 by yaox12 Member Loading…
13 tasks
ci: declare contents:read on Lint workflow
#2989 opened May 14, 2026 by arpitjain099 Loading…
Remove epel-release package from wheel Dockerfiles
#2987 opened May 13, 2026 by ksivaman Member Loading…
8 of 13 tasks
[JAX] Support for cuDNN-backed flex attention 2.16.0
#2985 opened May 13, 2026 by vcherepanov-nv Collaborator Loading…
4 of 13 tasks
[PyTorch] Support for cuDNN-backed flex attention 2.16.0
#2984 opened May 13, 2026 by vcherepanov-nv Collaborator Loading…
4 of 13 tasks
GGEMM+srelu kernels for MxFP8 Nemotron
#2981 opened May 12, 2026 by sraman-rgb Loading…
8 of 13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2978 opened May 12, 2026 by kainzhong Collaborator Loading…
9 of 13 tasks
[JAX] Improve JAX tutorial documentation 2.16.0
#2976 opened May 11, 2026 by jberchtold-nvidia Collaborator Loading…
8 of 13 tasks
[JAX] Size autotuned Triton grids per config
#2975 opened May 11, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
Implement 4over6 NVFP4 recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. fp4
#2972 opened May 9, 2026 by zianglih Contributor Loading…
8 of 13 tasks
[common] Grouped gemm update - nvfp4 for blackwell and fp8 blockwise hopper 2.16.0
#2971 opened May 8, 2026 by pggPL Collaborator Loading…
9 of 13 tasks
[PyTorch] Batch CP attention tests in single torchrun to amortize NCC… 2.16.0
#2965 opened May 6, 2026 by sudhakarsingh27 Collaborator Loading…
7 of 8 tasks
[All] Refactor nvte_get_fused_attn_backend with cudnn-frontend calls
#2964 opened May 6, 2026 by cyanguwa Collaborator Loading…
10 of 13 tasks
Draft:Extended Tensor Parallelism
#2960 opened May 5, 2026 by jiemingz Draft
13 tasks
[Common, PyTorch] Add Triton MLA attention kernels for SM80 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2950 opened Apr 30, 2026 by bzantium Loading…
Add NVFP4 1x64 Local Encode Recipe
#2941 opened Apr 29, 2026 by cael-ling Contributor Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Loading…
1 of 13 tasks
ProTip! no:milestone will show everything without a milestone.