When running VLLM-backed models, I expect it to use CUDA. However, the packaged pytorch does not include torch_cuda.so.
Versions
$ docker version
Client:
Version: 29.5.2
API version: 1.54
Go version: go1.26.3-X:nodwarf5
Git commit: 79eb04c7d8
Built: Mon Jun 1 15:47:11 2026
OS/Arch: linux/amd64
Context: default
Server:
Engine:
Version: 29.5.2
API version: 1.54 (minimum version 1.40)
Go version: go1.26.3-X:nodwarf5
Git commit: 568f755ebe
Built: Mon Jun 1 15:47:11 2026
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v2.3.1
GitCommit: 64b425cf570b3b8dd1d4cc46da7c1fce65c6651a.m
runc:
Version: 1.4.2
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker model version
Client:
Version: dev
OS/Arch: linux/amd64
Server:
Version: v1.2.1
Engine: Docker Engine
Re-installing the VLLM / CUDA image:-
$ docker model reinstall-runner --backend vllm --gpu cuda
Removing container docker-model-runner (3c0292cc30e0)...
latest-vllm-cuda: Pulling from docker/model-runner
Digest: sha256:25bec6f13611a055bc75ca5a38aa67a9ef110b33bd6800bdbaf81e8fd0551d95
Status: Image is up to date for docker/model-runner:latest-vllm-cuda
Successfully pulled docker/model-runner:latest-vllm-cuda
Starting model runner container docker-model-runner...
Running a simple model:-
$ docker model run smollm2-vllm
> background model preload failed: preload failed: status=500 body=unable to load runner: error waiting for runner to be ready: vLLM terminated unexpectedly: vLLM failed: _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm-env/lib/python3.12/site-packages/vllm/utils/import_utils.py", line 109, in resolve_obj_by_qualname
module = importlib.import_module(module_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm-env/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 21, in <module>
import vllm._C # noqa
^^^^^^^^^^^^^^
ImportError: libtorch_cuda.so: cannot open shared object file: No such file or directory
hon3.12/site-packages/vllm/config/compilation.py", line 22, in <module>
from vllm.platforms import current_platform
File "/opt/vllm-env/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 278, in __getattr__
Checking the images available:-
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda 7c46acda1696 13.1GB 0B U
I can see that the installed pytorch version is targeted for the CPU only:-
$ docker run --rm -it --gpus=all --entrypoint=bash docker/model-runner:latest-vllm-cuda
modelrunner@fa4f0d14b23e:/app$ . /opt/vllm-env/bin/activate
(vllm-env) modelrunner@fa4f0d14b23e:/app$ uv pip list | grep torch
Using Python 3.12.3 environment at: /opt/vllm-env
torch 2.11.0+cpu
torch-c-dlpack-ext 0.1.5
torchaudio 2.11.0+cpu
torchvision 0.26.0+cpu
Sanity check that nvidia-smi is in the container:-
(vllm-env) modelrunner@6faad2682127:/app$ nvidia-smi
Tue Jun 2 16:34:30 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 610.43.02 KMD Version: 610.43.02 CUDA UMD Version: 13.3 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3070 Off | 00000000:07:00.0 On | N/A |
| 0% 30C P8 12W / 220W | 154MiB / 8192MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Workaround??
I've rebuilt the image as per the README, after cloning the repository:-
# Build for specific architecture
docker buildx build \
--platform linux/amd64 \
--target final-vllm \
--build-arg BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 \
--build-arg LLAMA_SERVER_VARIANT=cuda \
--build-arg VLLM_VERSION=0.19.1 \
-t docker/model-runner:vllm .
This did require me to edit the Dockerfile, as python-3.12 is no longer in the apt repositories, so I just targeted python-3.
I also tagged the image with docker/model-runner:latest-vllm-cuda:-
$ docker tag docker/model-runner:vllm docker/model-runner:latest-vllm-cuda
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda 2ae255089a00 19.2GB 0B
docker/model-runner:vllm 2ae255089a00 19.2GB 0B
But, after installing or reinstalling the VLLM model, it resets docker/model-runner:latest-vllm-cuda to the 13.1GB model, with hash 7c46acda1696:-
$ docker model stop-runner
$ docker model start-runner --backend vllm --gpu cuda
Starting model runner container docker-model-runner...
$ docker images | grep model-runner
docker/model-runner:latest-vllm-cuda 2ae255089a00 19.2GB 0B U
docker/model-runner:vllm 2ae255089a00 19.2GB 0B U
Oh, this didn't overwrite my image tag this time... However, something still isn't working:-
(vllm-env) modelrunner@9d5a8ef06f83:/app$ python3 -c 'import torch; torch.cuda.list_gpu_processes()'
/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/__init__.py:180: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. cha
nging env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:119.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/memory.py", line 784, in list_gpu_processes
device = _get_nvml_device_index(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/vllm-env/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1089, in _get_nvml_device_index
if idx < 0 or idx >= len(visible_devices):
^^^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'
Is it possible for the image fetched with docker model install-runner --backend vllm --gpu cuda to be built with a functioning, CUDA-enabled pytorch library? I suspect that building with --torch-backend auto fetches the CPU-only version of pytorch in your existing build environment...
When running VLLM-backed models, I expect it to use CUDA. However, the packaged pytorch does not include
torch_cuda.so.Versions
Re-installing the VLLM / CUDA image:-
Running a simple model:-
Checking the images available:-
I can see that the installed
pytorchversion is targeted for the CPU only:-Sanity check that
nvidia-smiis in the container:-Workaround??
I've rebuilt the image as per the README, after cloning the repository:-
This did require me to edit the
Dockerfile, aspython-3.12is no longer in the apt repositories, so I just targetedpython-3.I also tagged the image with
docker/model-runner:latest-vllm-cuda:-But, after installing or reinstalling the VLLM model, it resets
docker/model-runner:latest-vllm-cudato the 13.1GB model, with hash7c46acda1696:-Oh, this didn't overwrite my image tag this time... However, something still isn't working:-
Is it possible for the image fetched with
docker model install-runner --backend vllm --gpu cudato be built with a functioning, CUDA-enabled pytorch library? I suspect that building with--torch-backend autofetches the CPU-only version of pytorch in your existing build environment...