fix: vllm-omni build pipeline improvements by lilyz-ai · Pull Request #774 · scaleapi/llm-engine

lilyz-ai · 2026-03-07T01:29:39Z

Summary

Add VLLM_PIP_VERSION build arg to handle vllm versions containing + (invalid in Docker image tags)
Add --vllm-pip-version CLI flag to build_and_upload_image.sh
Fix Dockerfile inline Python heredoc for nvidia lib path detection (adds None guard for __file__)
Switch Docker entrypoint from /bin/env to entrypoint.sh which prepends pip-installed NVIDIA CUDA lib paths to LD_LIBRARY_PATH
Add transformers>=5.0.0 to requirements.txt
Fix vllm_server.py import from relative to absolute path
Remove hardcoded torch==2.10.0 requirement for vllm 0.16.0

Test plan

Build vllm-omni Docker image using build_and_upload_image.sh with --vllm-pip-version flag
Verify entrypoint correctly sets LD_LIBRARY_PATH with pip-installed NVIDIA libs
Verify vllm server starts successfully with updated import path

🤖 Generated with Claude Code

Greptile Summary

This PR delivers a set of build pipeline and runtime fixes for the vllm-omni Docker image: it introduces a VLLM_PIP_VERSION build arg to decouple the pip install version (which may contain +) from the Docker image tag, replaces the bare /bin/env entrypoint with a dedicated entrypoint.sh that correctly injects pip-installed NVIDIA CUDA lib paths into LD_LIBRARY_PATH, and fixes a Python heredoc in the Dockerfile to guard against __file__ is None. On the Python side, relative imports in vllm_omni_server.py and vllm_server.py are corrected to absolute imports (required since these modules run as __main__), a latent bug in with_startup_metrics where a Namespace was incorrectly unpacked with *args is fixed, and the hardcoded torch==2.10.0 constraint for vllm 0.16.0 is removed.

Key changes:

Dockerfile.vllm: Adds VLLM_PIP_VERSION ARG, fixes Python heredoc None guard, switches ENTRYPOINT to entrypoint.sh
entrypoint.sh (new): Prepends pip-installed NVIDIA lib paths to LD_LIBRARY_PATH before exec "$@"
build_and_upload_image.sh: Adds --vllm-pip-version flag; removes VLLM_OMNI_VERSION from image tag (⚠️ potential tag collision — see inline comment)
startup_telemetry.py: Fixes func(*args) → func(args) for the non-iterable Namespace argument
vllm_omni_server.py / vllm_server.py: Corrects relative → absolute imports; adds --init-timeout and --stage-init-timeout args
requirements.txt: Pins transformers>=5.0.0

Confidence Score: 4/5

This PR is safe to merge with one low-risk concern around image tag uniqueness after removing VLLM_OMNI_VERSION from the tag format.
The changes are well-scoped bug fixes and build improvements. The critical func(*args) → func(args) fix corrects a TypeError that would have caused with_startup_metrics to fail whenever called. The import path corrections are necessary for __main__ execution. The entrypoint.sh is clean and correct. The one open question is the removal of VLLM_OMNI_VERSION from the vllm_omni image tag, which could cause tag collisions if the same VLLM version is built with different omni versions under the same USER_TAG — this is a process concern rather than a code defect.
Pay attention to build_and_upload_image.sh around the updated image tag format (line 258), since removing VLLM_OMNI_VERSION from the tag may cause images to overwrite each other in certain CI workflows.

Important Files Changed

Filename	Overview
model-engine/model_engine_server/inference/vllm/utils/startup_telemetry.py	Fixes a critical bug where `func(*args)` tried to unpack a non-iterable `Namespace` object; changed to `func(args)` which correctly passes the Namespace as a single positional argument.
model-engine/model_engine_server/inference/vllm/Dockerfile.vllm	Adds VLLM_PIP_VERSION build arg to decouple Docker tag version from pip version, fixes Python heredoc with None guard for file, and switches ENTRYPOINT to entrypoint.sh for proper LD_LIBRARY_PATH injection.
model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh	Adds --vllm-pip-version CLI flag and removes VLLM_OMNI_VERSION from image tag, which may cause tag collisions between builds of the same VLLM version with different omni versions.
model-engine/model_engine_server/inference/vllm/entrypoint.sh	New entrypoint script that correctly prepends pip-installed NVIDIA lib paths to LD_LIBRARY_PATH before exec-ing the CMD, replacing the previous bare /bin/env entrypoint.
model-engine/model_engine_server/inference/vllm/vllm_omni_server.py	Fixes relative imports to absolute imports (matching how the module is executed as main), adds --init-timeout and --stage-init-timeout CLI arguments for vllm-omni server.

Sequence Diagram

sequenceDiagram
    participant Docker as Docker Runtime
    participant EP as entrypoint.sh
    participant Env as LD_LIBRARY_PATH
    participant Server as vllm_omni_server.py
    participant Metrics as with_startup_metrics()
    participant OmniSrv as omni_run_server(args)

    Docker->>EP: exec entrypoint.sh [CMD args]
    EP->>Env: prepend pip NVIDIA lib paths\n(from NVIDIA_PIP_LIB_PATH_FILE)
    EP->>Server: exec python vllm_omni_server.py [args]
    Server->>Server: parse_args() → Namespace
    Server->>Metrics: with_startup_metrics(omni_run_server, args, t0)
    alt ENABLE_STARTUP_METRICS=true
        Metrics->>Metrics: init VLLMStartupMetrics
        Metrics->>Metrics: record_python_init()
        Metrics->>Metrics: start health_check_loop thread
    end
    Metrics->>OmniSrv: await func(args)  ← fixed from func(*args)
    OmniSrv-->>Metrics: server running
    Metrics-->>Server: return

Comments Outside Diff (2)

model-engine/model_engine_server/inference/vllm/Dockerfile.vllm, line 130-139 (link)

The PR title is "fix: vllm-omni build pipeline improvements" and the PR description explicitly calls out switching the entrypoint from /bin/env to entrypoint.sh to correctly prepend pip-installed NVIDIA CUDA lib paths to LD_LIBRARY_PATH. However, the vllm_omni stage was not updated—it still uses ENTRYPOINT ["/bin/env"] and never copies entrypoint.sh into the image.

The vllm_base stage writes NVIDIA_PIP_LIB_PATH_FILE=/tmp/nvidia_lib_path.txt, but this environment variable is only acted upon by entrypoint.sh. Since vllm_omni doesn't use entrypoint.sh, the LD_LIBRARY_PATH fix will have no effect for this build target, defeating the stated purpose of the PR.

The vllm_omni stage needs the same treatment as the vllm stage:
```
# vLLM-specific startup instrumentation
COPY model-engine/model_engine_server/inference/vllm/vllm_startup_wrapper.py /workspace/vllm_startup_wrapper.py
COPY model-engine/model_engine_server/inference/vllm/entrypoint.sh /workspace/entrypoint.sh
RUN chmod +x /workspace/entrypoint.sh

# Temp vllm_omni override for fix
COPY model-engine/model_engine_server/inference/vllm/vllm_overrides/omni/config/model.py /usr/local/lib/python3.12/dist-packages/vllm_omni/config/model.py

# Need to override entrypoint from parent image
ENTRYPOINT ["/workspace/entrypoint.sh"]
```
model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh, line 256-258 (link)

Image tag collision risk after removing VLLM_OMNI_VERSION

Removing VLLM_OMNI_VERSION from the image tag means two builds with the same VLLM_VERSION and USER_TAG but different VLLM_OMNI_VERSION values will produce the exact same tag, silently overwriting each other. For example:
- Build A: VLLM_VERSION=0.7.0, VLLM_OMNI_VERSION=0.16.0, USER_TAG=prod → 0.7.0-omni-prod
- Build B: VLLM_VERSION=0.7.0, VLLM_OMNI_VERSION=0.17.0, USER_TAG=prod → 0.7.0-omni-prod
These produce identical tags even though the images contain different vllm-omni versions. If USER_TAG is expected to encode the omni version (e.g., always passing --user-tag v0.16.0), this is fine — but worth making explicit in the script's usage comments or documentation. Otherwise this creates ambiguity when debugging or rolling back images.

_{Last reviewed commit: fdbe2d8}

- Add VLLM_PIP_VERSION build arg to handle versions with '+' invalid in Docker tags - Add --vllm-pip-version flag to build_and_upload_image.sh - Fix Dockerfile inline Python heredoc for nvidia lib path detection - Switch Docker entrypoint to entrypoint.sh for LD_LIBRARY_PATH setup - Add transformers>=5.0.0 to requirements.txt - Fix vllm_server.py import from relative to absolute path - Remove hardcoded torch 2.10.0 requirement for vllm 0.16.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

socket-security · 2026-03-07T01:32:00Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	transformers@5.3.0

View full report

- Fix SETUPTOOLS_SCM_PRETEND_VERSION for local vllm-omni source builds - Simplify omni image tag format to {VLLM_VERSION}-omni-{USER_TAG} - Fix with_startup_metrics to pass args object correctly (func(args) not func(*args)) - Fix relative imports in vllm_omni_server.py to absolute imports - Add --init-timeout (600s) and --stage-init-timeout (300s) args for torch.compile - Add task_type field to OmniModelConfig override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: vllm-omni build pipeline improvements#774

fix: vllm-omni build pipeline improvements#774
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/vllm-omni-build-fixes

lilyz-ai commented Mar 7, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

socket-security bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lilyz-ai commented Mar 7, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

socket-security bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lilyz-ai commented Mar 7, 2026 •

edited by greptile-apps bot

Loading

socket-security bot commented Mar 7, 2026 •

edited

Loading