[CI] Refactor CI build on GitHub by ptrendx · Pull Request #2723 · NVIDIA/TransformerEngine

ptrendx · 2026-03-02T19:51:31Z

Description

Run the build in the minimal container and use the pip wheel based CUDA to minimize the used disk space.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps · 2026-03-10T18:06:33Z

Greptile Summary

This PR refactors the GitHub CI build pipeline to use pre-built minimal containers (ghcr.io/ptrendx/te_gha_pytorch:latest and ghcr.io/ptrendx/te_gha_all:latest) and pip-wheel-based CUDA instead of spinning up a full DEVEL container at runtime. It also replaces sccache with standard ccache backed by actions/cache@v4, removes the disk-space maximization steps, and adds a get_cuda_library_dirs() utility for wheel-based CUDA library discovery.

Key changes:

build.yml: All four jobs (core, pytorch, jax, all) now use actions/cache@v4 + ccache; pytorch and all jobs use new pre-built containers and set CUDA_PATH/CUDNN_PATH from the nvidia pip namespace.
New Dockerfiles (Dockerfile.pytorch, Dockerfile.all): Define the new minimal CI containers with CUDA 13.x pip wheels; create unversioned .so symlinks for cudart, cublas, and nccl — but omit a corresponding libcudnn.so symlink.
logging.h: Moves #include "nccl.h" inside #ifdef NVTE_WITH_CUBLASMP, but the NVTE_CHECK_NCCL macro (which references ncclResult_t etc.) remains defined unconditionally — a latent compilation hazard for any consumer that doesn't separately include nccl.h.
utils.py: Adds get_cuda_library_dirs() and a fallback for the nvidia-cuda-runtime (non-cu12-suffixed) package in cuda_version().
jax.py: Wires get_cuda_library_dirs() into the JAX extension build.

Confidence Score: 3/5

PR is mostly safe but has one latent C++ compilation issue and a potential link-time failure in the new Dockerfiles.
The CI workflow refactor is well-structured, but moving nccl.h inside #ifdef NVTE_WITH_CUBLASMP while leaving NVTE_CHECK_NCCL unconditionally defined in logging.h creates a broken public API contract that will silently fail for future consumers. Additionally, the new Dockerfiles create .so symlinks for cudart, cublas, and nccl but not for cudnn, which could produce link errors depending on what the cudnn pip wheel ships.
transformer_engine/common/util/logging.h (NVTE_CHECK_NCCL / nccl.h guard mismatch) and build_tools/ci/Dockerfile.pytorch / build_tools/ci/Dockerfile.all (missing libcudnn.so symlink).

Important Files Changed

Filename	Overview
.github/workflows/build.yml	Major CI refactor: replaces manual Docker container spin-up and disk-space maximization steps with pre-built containers and native ccache via `actions/cache@v4`. Switches from `sccache` to standard `ccache`. Adds `CUDA_PATH`/`CUDNN_PATH` exports derived from nvidia pip wheels.
build_tools/ci/Dockerfile.pytorch	New Dockerfile for the pytorch CI container; installs CUDA 13.x nvidia wheels and creates symlinks for cudart, cublas, and nccl — but is missing a libcudnn.so unversioned symlink which could cause link failures.
build_tools/ci/Dockerfile.all	New Dockerfile for the all-framework CI container; adds JAX and PyTorch with CUDA 13 nvidia wheels, but also missing a libcudnn.so unversioned symlink.
build_tools/utils.py	Adds `get_cuda_library_dirs()` mirroring `get_cuda_include_dirs()`, and adds a fallback for the generic `nvidia-cuda-runtime` package in `cuda_version()`. The return type annotation (`Tuple[str, str]`) is copy-pasted incorrectly from the include-dir counterpart but this is a pre-existing style issue.
build_tools/jax.py	Imports and passes `get_cuda_library_dirs()` to the JAX extension build; clean, straightforward addition.
transformer_engine/common/util/logging.h	Moves `nccl.h` include inside `#ifdef NVTE_WITH_CUBLASMP`, but `NVTE_CHECK_NCCL` macro (outside the guard) still references NCCL types, creating a latent compilation failure for consumers that don't separately include nccl.h.
transformer_engine/jax/csrc/extensions/cgemm_helper.h	Adds direct `#include <nccl.h>` to compensate for the removal from `logging.h`; correct for this file but doesn't fix the broader logging.h API contract issue.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GitHub Actions Trigger] --> B{Job}
    B --> C[core job - ubuntu-latest runner]
    B --> D[pytorch job]
    B --> E[jax job]
    B --> F[all job]

    C --> C1[apt install ccache + cuda libs]
    C1 --> C2[actions/cache for /root/.ccache]
    C2 --> C3[pip install TE - NVTE_FRAMEWORK=none]
    C3 --> C4[Sanity check]

    D --> D1[Container: te_gha_pytorch - Dockerfile.pytorch]
    D1 --> D2[actions/cache for /root/.ccache]
    D2 --> D3[Set CUDA_PATH and CUDNN_PATH from nvidia wheels]
    D3 --> D4[pip install TE - NVTE_FRAMEWORK=pytorch - MAX_JOBS=2]
    D4 --> D5[Sanity check]

    E --> E1[Container: ghcr.io/nvidia/jax:jax]
    E1 --> E2[apt install ccache]
    E2 --> E3[actions/cache for /root/.ccache]
    E3 --> E4[pip install TE - NVTE_FRAMEWORK=jax]
    E4 --> E5[Sanity check]

    F --> F1[Container: te_gha_all - Dockerfile.all]
    F1 --> F2[actions/cache for /root/.ccache]
    F2 --> F3[Set CUDA_PATH and CUDNN_PATH from nvidia wheels]
    F3 --> F4[pip install TE - NVTE_FRAMEWORK=all - MAX_JOBS=2]
    F4 --> F5[Sanity check PyTorch]
    F5 --> F6[Sanity check JAX]

Comments Outside Diff (1)

transformer_engine/common/util/logging.h, line 109-115 (link)

NVTE_CHECK_NCCL macro missing nccl.h include

NVTE_CHECK_NCCL (lines 109–115) references ncclResult_t, ncclSuccess, and ncclGetErrorString, all of which are declared in nccl.h. However, nccl.h is now only included inside the #ifdef NVTE_WITH_CUBLASMP guard (lines 15–19). Since NVTE_CHECK_NCCL is defined unconditionally, any translation unit that:
- includes logging.h (directly or transitively), AND
- expands NVTE_CHECK_NCCL, AND
- does NOT separately #include <nccl.h> before use
…will fail to compile with "unknown type ncclResult_t" or similar errors when NVTE_WITH_CUBLASMP is not defined.

The current codebase avoids this only because cgemm_helper.h (the sole non-logging user) now adds its own #include <nccl.h>. But NVTE_CHECK_NCCL is a public macro in a general-purpose utility header, so this is a fragile and surprising contract for future consumers.

The fix is to either move NVTE_CHECK_NCCL inside the same #ifdef NVTE_WITH_CUBLASMP block, or add an unconditional #include "nccl.h" back at the top of the file alongside the other unconditional CUDA library includes.

_{Last reviewed commit: 42c7836}

greptile-apps · 2026-03-10T18:06:40Z

build_tools/utils.py

+    except:
+        pass


Bare except: swallows all exceptions

The bare except: clause silently catches everything including KeyboardInterrupt, SystemExit, and MemoryError, making debugging harder if the failure isn't PackageNotFoundError. It should be narrowed to the expected exception type:

Suggested change

except:

pass

except importlib.metadata.PackageNotFoundError:

pass

greptile-apps · 2026-03-10T18:06:41Z

build_tools/utils.py

+    if not force_wheels and cuda_toolkit_include_path() is not None:
+        return []
+
+    # Use pip wheels to include all headers.


Copy-paste comment refers to "headers" instead of "libraries"

This comment was copied from get_cuda_include_dirs and still says "headers" instead of "libraries":

Suggested change

# Use pip wheels to include all headers.

# Use pip wheels to find all library directories.

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps · 2026-03-11T04:46:33Z

build_tools/ci/Dockerfile.pytorch

+
+# Create symlinks for CUDA libraries
+RUN CUDA_PATH=$(python3 -c "import nvidia; print(list(nvidia.__path__)[0] + '/cu13')") && \
+    ln -s $CUDA_PATH/lib/libcudart.so.13 $CUDA_PATH/lib/libcudart.so && \
+    ln -s $CUDA_PATH/lib/libcublas.so.13 $CUDA_PATH/lib/libcublas.so && \
+    ln -s $CUDA_PATH/../nccl/lib/libnccl.so.2 $CUDA_PATH/../nccl/lib/libnccl.so && \
+    ln -s $CUDA_PATH/lib $CUDA_PATH/lib64


Missing libcudnn.so unversioned symlink

The symlink block creates unversioned aliases for libcudart, libcublas, and libnccl (needed for -lcudart, -lcublas, -lnccl at link time), but no equivalent symlink for libcudnn.so. The logging.h header unconditionally includes <cudnn.h>, and the build links against cudnn. If the cudnn pip wheel (typically pulled in as a torch dependency) only ships a versioned filename such as libcudnn.so.9, the linker will fail to resolve -lcudnn without an unversioned symlink.

Consider adding a similar symlink for cudnn, e.g.:

CUDNN_LIB=$(python3 -c "import nvidia; print(list(nvidia.__path__)[0] + '/cudnn/lib')") && \ ln -s $CUDNN_LIB/libcudnn.so.9 $CUDNN_LIB/libcudnn.so

The same applies to Dockerfile.all.

ptrendx and others added 24 commits February 24, 2026 15:16

Add the new CUDA package to the search for version

bc22b05

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Move nccl.h behind the cublasmp guard

a758953

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Try pytorch build

4c86fb0

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

b56daa2

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

a59b6ab

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

32fafa4

for more information, see https://pre-commit.ci

Maybe fix?

cd7bf8a

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

7e636cc

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Test

f033c8b

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

df606be

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix for cuDNN

edbc265

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Trying without sccache

2e66d4f

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

b373a86

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Maybe inside a clean container?

8a48eb7

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Test

13abe8f

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Add packaging missing dependency and ccache

f0714ef

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fix

47fa3f1

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Changed All

5e26ea2

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

68a3c70

for more information, see https://pre-commit.ci

Add ccache to jax build

8cd53db

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Fixes

d250403

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

27218da

for more information, see https://pre-commit.ci

Fix

3949f12

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Merge branch 'main' into pr_github_build

da09ded

ptrendx marked this pull request as ready for review March 10, 2026 18:02

greptile-apps bot reviewed Mar 10, 2026

View reviewed changes

ptrendx added 2 commits March 10, 2026 21:42

Start of custom containers

b39e385

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

Test with prebuilt containers

42c7836

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

greptile-apps bot reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Refactor CI build on GitHub#2723

[CI] Refactor CI build on GitHub#2723
ptrendx wants to merge 26 commits intoNVIDIA:mainfrom
ptrendx:pr_github_build

ptrendx commented Mar 2, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 10, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

greptile-apps bot Mar 10, 2026

Uh oh!

greptile-apps bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    except:
-        pass
+    except importlib.metadata.PackageNotFoundError:
+        pass

	# Use pip wheels to include all headers.
	# Use pip wheels to find all library directories.

Conversation

ptrendx commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ptrendx commented Mar 2, 2026 •

edited

Loading

greptile-apps bot commented Mar 10, 2026 •

edited

Loading