Skip to content

kmod-6.18-nvidia-r580: add gdrcopy subpackage#462

Draft
mgsharm wants to merge 1 commit into
bottlerocket-os:developfrom
mgsharm:gdrcopy
Draft

kmod-6.18-nvidia-r580: add gdrcopy subpackage#462
mgsharm wants to merge 1 commit into
bottlerocket-os:developfrom
mgsharm:gdrcopy

Conversation

@mgsharm

@mgsharm mgsharm commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Adds gdrdrv (v2.5.2) for open-gpu and tesla driver paths. Includes a misc_register patch so /dev/gdrdrv is created automatically.

Description of changes:

  • Adds gdrcopy (gdrdrv kernel module, v2.5.2) to kmod-6.18-nvidia-r580 as a new subpackage. Builds it for both open-gpu and tesla driver paths and loads the right one via ghostdog at boot. Small patch to gdrdrv.c so /dev/gdrdrv is created automatically.

Testing done:

  • Built kit, baked AMI, tested on g4dn (open driver, works clean) and g5 (proprietary driver, gdrdrv fails to load due to license taint from nvidia.ko). Open question whether to keep the tesla flavor or drop it.
Details
bash-5.2# ls -l /dev/gdrdrv
py-gdrcopy-open-gpu-kernel-module.service load-gdrcopy-open-gpu-kernel-module.service
  copy-gdrcopy-tesla-kernel-module.service load-gdrcopy-tesla-kernel-module.service
  find /lib/modules/$(uname -r)/kernel/drivers/extra/video/nvidia/gdrcopy -type f
  modinfo /lib/modules/$(uname -r)/kernel/drivers/extra/video/nvidia/gdrcopy/*/gdrdrv.ko | grep -E 'vermagic|depends|description'
  nvidia-smicrw-rw-rw-. 1 root root 10, 259 Jun 10 17:50 /dev/gdrdrv
bash-5.2#   dmesg | grep -iE 'gdrdrv|nvidia.*p2p'
[   23.535718] gdrdrv:gdrdrv_init:loading gdrdrv version 2.5 built for opensource NVIDIA driver
[   23.535722] gdrdrv:gdrdrv_init:registered as misc device, minor 259
[   23.535723] gdrdrv:gdrdrv_init:dbg traces disabled, info traces disabled
[   23.535724] gdrdrv:gdrdrv_init:Persistent mapping will be used
bash-5.2#   ls -l /sys/class/misc/gdrdrv
lrwxrwxrwx. 1 root root 0 Jun 10 17:53 /sys/class/misc/gdrdrv -> ../../devices/virtual/misc/gdrdrv
bash-5.2#   cat /sys/class/misc/gdrdrv/dev
10:259
bash-5.2#   lsmod | grep -E 'nvidia|gdrdrv|nv_p2p'
gdrdrv                 28672  0
nvidia_uvm           1990656  0
nvidia_modeset       1753088  0
nvidia              13991936  159 nvidia_uvm,gdrdrv,nvidia_modeset
video                  81920  1 nvidia_modeset
drm                   794624  1 nvidia
i2c_core              122880  4 nvidia,i2c_smbus,i2c_piix4,drm
backlight              28672  3 video,drm,nvidia_modeset
fips140              1064960  9 nvidia,ghash_clmulni_intel
bash-5.2#   /usr/bin/ghostdog match-nvidia-driver open-gpu; echo "open=$?"
open=0
bash-5.2#   /usr/bin/ghostdog match-nvidia-driver tesla; echo "tesla=$?"
Error: tesla is not preferred driver: open-gpu

tesla=1
bash-5.2#   systemctl is-active copy-gdrcopy-open-gpu-kernel-module.service load-gdrcopy-open-gpu-kernel-module.service
active
active
bash-5.2#   modinfo /lib/modules/$(uname -r)/kernel/drivers/extra/video/nvidia/gdrcopy/*/gdrdrv.ko | grep -E 'vermagic|depends|description'
description:    GDRCopy kernel-mode driver built for opensource NVIDIA driver
depends:        nv-p2p-dummy
vermagic:       6.18.33 SMP preempt mod_unload modversions
bash-5.2#   nvidia-smi
Wed Jun 10 17:53:42 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.159.03             Driver Version: 580.159.03     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  |   00000000:10:1C.0 Off |                    0 |
| N/A   32C    P0             53W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  |   00000000:10:1D.0 Off |                    0 |
| N/A   30C    P0             51W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  |   00000000:20:1C.0 Off |                    0 |
| N/A   31C    P0             51W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  |   00000000:20:1D.0 Off |                    0 |
| N/A   30C    P0             52W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  |   00000000:90:1C.0 Off |                    0 |
| N/A   31C    P0             52W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  |   00000000:90:1D.0 Off |                    0 |
| N/A   30C    P0             51W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  |   00000000:A0:1C.0 Off |                    0 |
| N/A   31C    P0             54W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  |   00000000:A0:1D.0 Off |                    0 |
| N/A   30C    P0             52W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

Adds gdrdrv (v2.5.2) for open-gpu and tesla driver paths. Includes a
misc_register patch so /dev/gdrdrv is created automatically.

Signed-off-by: Gaurav Sharma <mgsharm@amazon.com>
@mgsharm mgsharm requested a review from arnaldo2792 June 10, 2026 01:01
[[package.metadata.build-package.external-files]]
url = "https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v2.5.2.tar.gz"
sha512 = "c717f118eff8cd5a8dc35613c3881818f8b71dc493461dd0151ce7c882f8e2c2d852e22733fab4e2bec57219e10eec874c11b4fad90dd4815ae572840ed19d28"
force-upstream = true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. This kernel module is MIT and we can distribute its sources.

@@ -0,0 +1,84 @@
From: Bottlerocket Kernel Kit <noreply@amazon.com>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand where are you coming from with this patch, but I'll prefer if we take a different approach just so that we don't carry-on a patch.

I think we can extend ghostdog to create the device and call it after we load the kernel module, as a subsequent ExecStart call. It will basically do what the insmod script does in the GDRCopy repo.

Source20: https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/nvidia-imex-%{tesla_ver}-1.amzn2023.x86_64.rpm
Source21: https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/sbsa/nvidia-imex-%{tesla_ver}-1.amzn2023.aarch64.rpm

Source100: https://github.com/NVIDIA/gdrcopy/archive/refs/tags/v%{gdrcopy_ver}.tar.gz

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the core kit, fetch the archive from the archives URLs. You can check what we do in other packages for the core kit.

Comment on lines +190 to +192
Ships two flavors of gdrdrv: one built against the open NVIDIA driver and one
against the proprietary driver. The right flavor is loaded at boot based on
which driver variant ghostdog matches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, this isn't needed.

%patch 2 -p1
popd

pushd gdrcopy-%{gdrcopy_ver}/src/gdrdrv

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of re-using sources, could you please prepare two copies of the sources for each compilation? I recall we do something similar elsewhere where we prepare two copies of the same sources but we just configure them differently.

Comment on lines +309 to +312
NVIDIA_IS_OPENSOURCE=y \
HAVE_VM_FLAGS_SET=y \
HAVE_PROC_OPS=y \
KBUILD_EXTRA_SYMBOLS="%{_builddir}/NVIDIA-Linux-%{_cross_arch}-%{tesla_ver}/kernel-open/Module.symvers" \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be provided after your make just for consistency with the existing make command.

modules

%{_cross_target}-strip -g --strip-unneeded gdrdrv.ko
mv gdrdrv.ko ../../gdrdrv-open-gpu.ko

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have two sources, you don't have to move the kernel module and you can refer to it directly in the install section.

make clean

NVIDIA_SRC_DIR="%{_builddir}/NVIDIA-Linux-%{_cross_arch}-%{tesla_ver}/kernel/nvidia" \
NVIDIA_IS_OPENSOURCE=y \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the open source version, right?

LD=%{_cross_target}-ld \
modules

%{_cross_target}-strip -g --strip-unneeded gdrdrv.ko

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to strip the kernel symbols? Isn't that already done when the kernel module is built?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants