VLLM CUDA image doesn't run on GPU

When running VLLM-backed models, I expect it to use CUDA. However, the packaged pytorch does not include `torch_cuda.so`.

**Versions**

```
$ docker version
Client:
 Version:           29.5.2
 API version:       1.54
 Go version:        go1.26.3-X:nodwarf5
 Git commit:        79eb04c7d8
 Built:             Mon Jun  1 15:47:11 2026
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          29.5.2
  API version:      1.54 (minimum version 1.40)
  Go version:       go1.26.3-X:nodwarf5
  Git commit:       568f755ebe
  Built:            Mon Jun  1 15:47:11 2026
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.3.1
  GitCommit:        64b425cf570b3b8dd1d4cc46da7c1fce65c6651a.m
 runc:
  Version:          1.4.2
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
```

```
$ docker model version
Client:
 Version:    dev
 OS/Arch:    linux/amd64

Server:
 Version:    v1.2.1
 Engine:     Docker Engine
```


Re-installing the VLLM / CUDA image:-
```
$ docker model reinstall-runner --backend vllm --gpu cuda
Removing container docker-model-runner (3c0292cc30e0)...
latest-vllm-cuda: Pulling from docker/model-runner
Digest: sha256:25bec6f13611a055bc75ca5a38aa67a9ef110b33bd6800bdbaf81e8fd0551d95
Status: Image is up to date for docker/model-runner:latest-vllm-cuda
Successfully pulled docker/model-runner:latest-vllm-cuda
Starting model runner container docker-model-runner...
```

Running a simple model:-
```
$ docker model run smollm2-vllm
> background model preload failed: preload failed: status=500 body=unable to load runner: error waiting for runner to be ready: vLLM terminated unexpectedly: vLLM failed:     _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm-env/lib/python3.12/site-packages/vllm/utils/import_utils.py", line 109, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm-env/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 21, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: libtorch_cuda.so: cannot open shared object file: No such file or directory
hon3.12/site-packages/vllm/config/compilation.py", line 22, in <module>
    from vllm.platforms import current_platform
  File "/opt/vllm-env/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 278, in __getattr__
```

Checking the images available:-

```
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda                  7c46acda1696       13.1GB             0B   U
```

I can see that the installed `pytorch` version is targeted for the CPU only:-

```
$ docker run --rm -it --gpus=all --entrypoint=bash docker/model-runner:latest-vllm-cuda
modelrunner@fa4f0d14b23e:/app$ . /opt/vllm-env/bin/activate
(vllm-env) modelrunner@fa4f0d14b23e:/app$ uv pip list | grep torch
Using Python 3.12.3 environment at: /opt/vllm-env
torch                                    2.11.0+cpu
torch-c-dlpack-ext                       0.1.5
torchaudio                               2.11.0+cpu
torchvision                              0.26.0+cpu
```

Sanity check that `nvidia-smi` is in the container:-

```
(vllm-env) modelrunner@6faad2682127:/app$ nvidia-smi
Tue Jun  2 16:34:30 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 610.43.02              KMD Version: 610.43.02     CUDA UMD Version: 13.3     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3070        Off |   00000000:07:00.0  On |                  N/A |
|  0%   30C    P8             12W /  220W |     154MiB /   8192MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```

---

**Workaround??**

I've rebuilt the image as per the [README](https://github.com/docker/model-runner/blob/main/README.md#building-the-vllm-variant), after cloning the repository:-

```
# Build for specific architecture
docker buildx build \
  --platform linux/amd64 \
  --target final-vllm \
  --build-arg BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 \
  --build-arg LLAMA_SERVER_VARIANT=cuda \
  --build-arg VLLM_VERSION=0.19.1 \
  -t docker/model-runner:vllm .
```

This did require me to edit the `Dockerfile`, as `python-3.12` is no longer in the apt repositories, so I just targeted `python-3`.

I also tagged the image with `docker/model-runner:latest-vllm-cuda`:-

```
$ docker tag docker/model-runner:vllm docker/model-runner:latest-vllm-cuda
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda                  2ae255089a00       19.2GB             0B
docker/model-runner:vllm                              2ae255089a00       19.2GB             0B
```

But, after installing or reinstalling the VLLM model, it resets `docker/model-runner:latest-vllm-cuda` to the 13.1GB model, with hash `7c46acda1696`:-

```
$ docker model stop-runner
$ docker model start-runner --backend vllm --gpu cuda
Starting model runner container docker-model-runner...
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda                  2ae255089a00       19.2GB             0B   U
docker/model-runner:vllm                              2ae255089a00       19.2GB             0B   U
```

Oh, this didn't overwrite my image tag this time... However, something still isn't working:-

```
(vllm-env) modelrunner@9d5a8ef06f83:/app$ python3 -c 'import torch; torch.cuda.list_gpu_processes()'
/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/__init__.py:180: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. cha
nging env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:119.)
  return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/memory.py", line 784, in list_gpu_processes
    device = _get_nvml_device_index(device)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1089, in _get_nvml_device_index
    if idx < 0 or idx >= len(visible_devices):
       ^^^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'
```

Is it possible for the image fetched with `docker model install-runner --backend vllm --gpu cuda` to be built with a functioning, CUDA-enabled pytorch library? I suspect that building with `--torch-backend auto` fetches the CPU-only version of pytorch in your existing build environment...






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLLM CUDA image doesn't run on GPU #952

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

VLLM CUDA image doesn't run on GPU #952

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions