You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/site_specific_config/gpu.md
+89-78Lines changed: 89 additions & 78 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,123 +23,134 @@ can use the GPU in your system is available below.
23
23
EESSI supports running CUDA-enabled software. All CUDA-enabled modules are marked with the `(gpu)` feature,
24
24
which is visible in the output produced by `module avail`.
25
25
26
-
### NVIDIA GPU drivers {: #nvidia_drivers}
26
+
### Configuring runtime support {: #nvidia_drivers}
27
27
28
28
For CUDA-enabled software to run, it needs to be able to find the **NVIDIA GPU drivers** of the host system.
29
29
The challenge here is that the NVIDIA GPU drivers are not _always_ in a standard system location, and that we
30
30
can not install the GPU drivers in EESSI (since they are too closely tied to the client OS and GPU hardware).
31
31
32
-
###Compiling software on top of CUDA, cuDNN and other SDKs provided by NVIDIA {: #cuda_sdk }
32
+
#### Enabling runtime support for a native EESSI installation (using the helper script) {: #nvidia_eessi_native }
33
33
34
-
An additional requirement is necessary if you want to be able to compile software
35
-
that makes use of a CUDA installation or cu\* SDKs (e.g., cuDNN) included in
36
-
EESSI. This requires a *full* installation of the CUDA SDK, cuDNN, etc. However,
37
-
the [CUDA SDK End User License Agreement (EULA)](https://docs.nvidia.com/cuda/eula/index.html)
38
-
and the [Software License Agreement (SLA) for NVIDIA cuDNN](https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html)
39
-
do not allow for full redistribution. In EESSI, we are (currently) only allowed to
40
-
redistribute the files needed to *run* CUDA and cuDNN software.
34
+
To get runtime support, we need to ensure that the EESSI runtime linker can find the drivers. To do this, we symlink the drivers
35
+
in a predictable location that is searched by the EESSI runtime linker.
41
36
42
-
!!! note "A full CUDA SDK or cuDNN SDK is only needed to *compile* CUDA or cuDNN software"
43
-
Without a full CUDA SDK or cuDNN SDK on the host system, you will still
44
-
be able to *run* CUDA-enabled or cuDNN-enabled software from the EESSI stack,
45
-
you just won't be able to *compile* additional CUDA or cuDNN software.
37
+
*Step 1:*[initialize a version of EESSI](../using_eessi/setting_up_environment.md).
46
38
47
-
Below, we describe how to make sure that the EESSI software stack can find your
48
-
NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK and the
49
-
cuDNN SDK.
39
+
*Step 2 (EESSI 2025.06 and newer, mandatory):* define the `EESSI_NVIDIA_OVERRIDE_DEFAULT` variable in your local CernVM-FS configuration to point to a directory where you want
40
+
to store the symlinks to the drivers. For example, to store these under `/opt/eessi/nvidia`, one would run:
50
41
51
-
### Configuring CUDA driver location {: #driver_location }
52
-
53
-
All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory.
54
-
In addition, installations of the CUDA SDK and cuDNN SDK included EESSI are stripped down to the files that we are allowed to redistribute;
55
-
all other files are replaced by symbolic links that point to another specific subdirectory of `host_injections`. For example:
56
-
```
57
-
$ ls -l /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc
If the corresponding full installation of the CUDA SDK is available there, the
62
-
CUDA installation included in EESSI can be used to build CUDA software. The same
63
-
applies to the cuDNN SDK.
64
-
65
-
66
-
### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native }
67
-
68
-
Here, we describe the steps to enable GPU support when you have a [native EESSI installation](../getting_access/native_installation.md) on your system.
46
+
*Step 2 (EESSI 2023.06, optional):* Change the location in which the symlinks will end up by configuring `EESSI_HOST_INJECTIONS` explicitly (default: `/opt/eessi`):
69
47
70
-
!!! warning "Required permissions"
71
-
To enable GPU support for EESSI on your system, you will typically need to have system administration rights, since you need write permissions on the folder to the target directory of the `host_injections` symlink.
This script uses `ldconfig` on your host system to locate your GPU drivers, and creates symbolic links to them in the correct location under `host_injections` directory. It also stores the CUDA version supported by the driver that the symlinks were created for.
58
+
!!! tip "Rerun script after each driver update"
59
+
You should re-run this script every time you update the NVIDIA GPU drivers on the host system, as it may expose libraries that are new to your driver version.
60
+
Note that it is safe to re-run the script even if no driver updates were done: the script should detect that the current version of the drivers were already symlinked.
82
61
83
-
!!! tip "Re-run `link_nvidia_host_libraries.sh` after NVIDIA GPU driver update"
84
-
You should re-run this script every time you update the NVIDIA GPU drivers on the host system.
62
+
!!! tip "Maintaining different driver versions for each EESSI version"
63
+
The standard approach for EESSI >= 2025.06 means that the drivers may be found by any EESSI version. If you prefer to create one set of symlinks per EESSI
64
+
version, instead of defining a single location through EESSI_NVIDIA_OVERRIDE_DEFAULT, you can define one per EESSI version, by setting EESSI_<VERSION>_NVIDIA_OVERRIDE.
Note that it is safe to re-run the script even if no driver updates were done: the script should detect that the current version of the drivers were already symlinked.
70
+
!!! note "How does EESSI find the linked drivers?"
87
71
88
-
#### Installing full CUDA SDK and cuDNN SDK (optional) {: #installing-full-cuda-sdk-optional }
72
+
The runtime linker provided by the EESSI [compatibility layer](../compatibility_layer.md) is configured to search an
73
+
additional directory (run `ld.so --help | grep -A 10 "Shared library search path"` after initializing EESSI).
74
+
For `EESSI/2025.06` and later, that is: `/cvmfs/software.eessi.io/versions/<EESSI_VERSION>/compat/<OS>/<ARCH>/lib/nvidia`).
75
+
This directory is special, since it is a CernVM-FS [Variant Symlink](https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#variant-symlinks).
76
+
The target of this symlink is what you configure in your local CernVM-FS configuration.
89
77
90
-
To install a full CUDA SDK and cuDNN SDK under `host_injections`, use the `install_cuda_and_libraries.sh` script that is included in EESSI:
78
+
#### Enabling runtime support for a native EESSI installation (using manual symlinking)
79
+
If, for some reason, the helper script is unable to locate the drivers on your system you _can_ link them manually.
80
+
To do so, grab the list of libraries that need to be symlinked from [here](https://raw.githubusercontent.com/apptainer/apptainer/main/etc/nvliblist.conf).
81
+
Then, change to the correct directory:
82
+
- For EESSI 2025.06 and later: `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}>/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/lib/nvidia`,
83
+
- For EESSI 2023.06: `/cvmfs/software.eessi.io/host_injections/${EESSI_VERSION}/compat/${EESSI_OS_TYPE}/${EESSI_CPU_FAMILY}/lib`
84
+
Then, manually create the symlinks for each of the files in the aforementioned list (if they exist on your system) to the current directory.
By default, the install script processes all files matching `eessi-*CUDA*.yml` in
107
-
the above `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/easystacks` directory.
88
+
If you are running your own [Apptainer](https://apptainer.org/)/[Singularity](https://sylabs.io/singularity) container,
89
+
it is sufficient to use the [`--nv` option](https://apptainer.org/docs/user/latest/gpu.html#nvidia-gpus-cuda-standard)
90
+
to enable access to GPUs from within the container. This will ensure the container runtime exposes the drivers through
91
+
the `$LD_LIBRARY_PATH`.
92
+
93
+
If you are using the [EESSI container](../getting_access/eessi_container.md) to access the EESSI software,
94
+
simply pass `--nvidia run` or `--nvidia all` to enable nvidia GPU runtime support.
108
95
109
-
You can run `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh --help` to check all of the options.
96
+
### Configuring compile time support {: #cuda_sdk }
110
97
111
-
!!! tip
98
+
To compile new CUDA software using dependencies from EESSI, additional configuration is needed.
99
+
100
+
The [CUDA license](https://docs.nvidia.com/cuda/eula/index.html) and [cuDNN license](https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html)
101
+
only allow redistribution of their _runtime_ libraries. Thus, the installations of CUDA and cuDNN that come with EESSI have been stripped down to contain
102
+
only the runtime libraries. A local installation of CUDA and cuDNN is required to compile new software.
103
+
104
+
!!! note "A full CUDA SDK or cuDNN SDK is only needed to *compile* CUDA or cuDNN software"
105
+
Without a full CUDA SDK or cuDNN SDK on the host system, you will still
106
+
be able to *run* CUDA-enabled or cuDNN-enabled software from the EESSI stack
107
+
(provided the required configuration for runtime support was done - see above),
108
+
you just won't be able to *compile* additional CUDA or cuDNN software.
112
109
113
-
This script uses EasyBuild to install the CUDA SDK and the cuDNN SDK. For this to work, two requirements need to be satisfied:
110
+
First, [initialize a version of EESSI](../using_eessi/setting_up_environment.md).
114
111
115
-
* `module load EasyBuild/${EB_VERSION}` must work (EB_VERSION is extracted
116
-
from the name of the easystack file (e.g., from `eb-4.9.4` EB_VERSION is
117
-
derived as 4.9.4);
118
-
* `module load EESSI-extend/${EESSI_VERSION}-easybuild` must work.
112
+
Second, (optionally) define the `EESSI_HOST_INJECTIONS` variable in your local CernVM-FS configuration to point to a directory where you want to
113
+
store the local installations of CUDA and cuDNN (the default location is `/opt/eessi`):
0 commit comments