Skip to content

fix: vllm-omni build pipeline improvements#774

Open
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/vllm-omni-build-fixes
Open

fix: vllm-omni build pipeline improvements#774
lilyz-ai wants to merge 2 commits intomainfrom
lilyzhu/vllm-omni-build-fixes

Conversation

@lilyz-ai
Copy link
Collaborator

@lilyz-ai lilyz-ai commented Mar 7, 2026

Summary

  • Add VLLM_PIP_VERSION build arg to handle vllm versions containing + (invalid in Docker image tags)
  • Add --vllm-pip-version CLI flag to build_and_upload_image.sh
  • Fix Dockerfile inline Python heredoc for nvidia lib path detection (adds None guard for __file__)
  • Switch Docker entrypoint from /bin/env to entrypoint.sh which prepends pip-installed NVIDIA CUDA lib paths to LD_LIBRARY_PATH
  • Add transformers>=5.0.0 to requirements.txt
  • Fix vllm_server.py import from relative to absolute path
  • Remove hardcoded torch==2.10.0 requirement for vllm 0.16.0

Test plan

  • Build vllm-omni Docker image using build_and_upload_image.sh with --vllm-pip-version flag
  • Verify entrypoint correctly sets LD_LIBRARY_PATH with pip-installed NVIDIA libs
  • Verify vllm server starts successfully with updated import path

🤖 Generated with Claude Code

Greptile Summary

This PR delivers a set of build pipeline and runtime fixes for the vllm-omni Docker image: it introduces a VLLM_PIP_VERSION build arg to decouple the pip install version (which may contain +) from the Docker image tag, replaces the bare /bin/env entrypoint with a dedicated entrypoint.sh that correctly injects pip-installed NVIDIA CUDA lib paths into LD_LIBRARY_PATH, and fixes a Python heredoc in the Dockerfile to guard against __file__ is None. On the Python side, relative imports in vllm_omni_server.py and vllm_server.py are corrected to absolute imports (required since these modules run as __main__), a latent bug in with_startup_metrics where a Namespace was incorrectly unpacked with *args is fixed, and the hardcoded torch==2.10.0 constraint for vllm 0.16.0 is removed.

Key changes:

  • Dockerfile.vllm: Adds VLLM_PIP_VERSION ARG, fixes Python heredoc None guard, switches ENTRYPOINT to entrypoint.sh
  • entrypoint.sh (new): Prepends pip-installed NVIDIA lib paths to LD_LIBRARY_PATH before exec "$@"
  • build_and_upload_image.sh: Adds --vllm-pip-version flag; removes VLLM_OMNI_VERSION from image tag (⚠️ potential tag collision — see inline comment)
  • startup_telemetry.py: Fixes func(*args)func(args) for the non-iterable Namespace argument
  • vllm_omni_server.py / vllm_server.py: Corrects relative → absolute imports; adds --init-timeout and --stage-init-timeout args
  • requirements.txt: Pins transformers>=5.0.0

Confidence Score: 4/5

  • This PR is safe to merge with one low-risk concern around image tag uniqueness after removing VLLM_OMNI_VERSION from the tag format.
  • The changes are well-scoped bug fixes and build improvements. The critical func(*args)func(args) fix corrects a TypeError that would have caused with_startup_metrics to fail whenever called. The import path corrections are necessary for __main__ execution. The entrypoint.sh is clean and correct. The one open question is the removal of VLLM_OMNI_VERSION from the vllm_omni image tag, which could cause tag collisions if the same VLLM version is built with different omni versions under the same USER_TAG — this is a process concern rather than a code defect.
  • Pay attention to build_and_upload_image.sh around the updated image tag format (line 258), since removing VLLM_OMNI_VERSION from the tag may cause images to overwrite each other in certain CI workflows.

Important Files Changed

Filename Overview
model-engine/model_engine_server/inference/vllm/utils/startup_telemetry.py Fixes a critical bug where func(*args) tried to unpack a non-iterable Namespace object; changed to func(args) which correctly passes the Namespace as a single positional argument.
model-engine/model_engine_server/inference/vllm/Dockerfile.vllm Adds VLLM_PIP_VERSION build arg to decouple Docker tag version from pip version, fixes Python heredoc with None guard for file, and switches ENTRYPOINT to entrypoint.sh for proper LD_LIBRARY_PATH injection.
model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh Adds --vllm-pip-version CLI flag and removes VLLM_OMNI_VERSION from image tag, which may cause tag collisions between builds of the same VLLM version with different omni versions.
model-engine/model_engine_server/inference/vllm/entrypoint.sh New entrypoint script that correctly prepends pip-installed NVIDIA lib paths to LD_LIBRARY_PATH before exec-ing the CMD, replacing the previous bare /bin/env entrypoint.
model-engine/model_engine_server/inference/vllm/vllm_omni_server.py Fixes relative imports to absolute imports (matching how the module is executed as main), adds --init-timeout and --stage-init-timeout CLI arguments for vllm-omni server.

Sequence Diagram

sequenceDiagram
    participant Docker as Docker Runtime
    participant EP as entrypoint.sh
    participant Env as LD_LIBRARY_PATH
    participant Server as vllm_omni_server.py
    participant Metrics as with_startup_metrics()
    participant OmniSrv as omni_run_server(args)

    Docker->>EP: exec entrypoint.sh [CMD args]
    EP->>Env: prepend pip NVIDIA lib paths\n(from NVIDIA_PIP_LIB_PATH_FILE)
    EP->>Server: exec python vllm_omni_server.py [args]
    Server->>Server: parse_args() → Namespace
    Server->>Metrics: with_startup_metrics(omni_run_server, args, t0)
    alt ENABLE_STARTUP_METRICS=true
        Metrics->>Metrics: init VLLMStartupMetrics
        Metrics->>Metrics: record_python_init()
        Metrics->>Metrics: start health_check_loop thread
    end
    Metrics->>OmniSrv: await func(args)  ← fixed from func(*args)
    OmniSrv-->>Metrics: server running
    Metrics-->>Server: return
Loading

Comments Outside Diff (2)

  1. model-engine/model_engine_server/inference/vllm/Dockerfile.vllm, line 130-139 (link)

    The PR title is "fix: vllm-omni build pipeline improvements" and the PR description explicitly calls out switching the entrypoint from /bin/env to entrypoint.sh to correctly prepend pip-installed NVIDIA CUDA lib paths to LD_LIBRARY_PATH. However, the vllm_omni stage was not updated—it still uses ENTRYPOINT ["/bin/env"] and never copies entrypoint.sh into the image.

    The vllm_base stage writes NVIDIA_PIP_LIB_PATH_FILE=/tmp/nvidia_lib_path.txt, but this environment variable is only acted upon by entrypoint.sh. Since vllm_omni doesn't use entrypoint.sh, the LD_LIBRARY_PATH fix will have no effect for this build target, defeating the stated purpose of the PR.

    The vllm_omni stage needs the same treatment as the vllm stage:

    # vLLM-specific startup instrumentation
    COPY model-engine/model_engine_server/inference/vllm/vllm_startup_wrapper.py /workspace/vllm_startup_wrapper.py
    COPY model-engine/model_engine_server/inference/vllm/entrypoint.sh /workspace/entrypoint.sh
    RUN chmod +x /workspace/entrypoint.sh
    
    # Temp vllm_omni override for fix
    COPY model-engine/model_engine_server/inference/vllm/vllm_overrides/omni/config/model.py /usr/local/lib/python3.12/dist-packages/vllm_omni/config/model.py
    
    # Need to override entrypoint from parent image
    ENTRYPOINT ["/workspace/entrypoint.sh"]
  2. model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh, line 256-258 (link)

    Image tag collision risk after removing VLLM_OMNI_VERSION

    Removing VLLM_OMNI_VERSION from the image tag means two builds with the same VLLM_VERSION and USER_TAG but different VLLM_OMNI_VERSION values will produce the exact same tag, silently overwriting each other. For example:

    • Build A: VLLM_VERSION=0.7.0, VLLM_OMNI_VERSION=0.16.0, USER_TAG=prod0.7.0-omni-prod
    • Build B: VLLM_VERSION=0.7.0, VLLM_OMNI_VERSION=0.17.0, USER_TAG=prod0.7.0-omni-prod

    These produce identical tags even though the images contain different vllm-omni versions. If USER_TAG is expected to encode the omni version (e.g., always passing --user-tag v0.16.0), this is fine — but worth making explicit in the script's usage comments or documentation. Otherwise this creates ambiguity when debugging or rolling back images.

Last reviewed commit: fdbe2d8

- Add VLLM_PIP_VERSION build arg to handle versions with '+' invalid in Docker tags
- Add --vllm-pip-version flag to build_and_upload_image.sh
- Fix Dockerfile inline Python heredoc for nvidia lib path detection
- Switch Docker entrypoint to entrypoint.sh for LD_LIBRARY_PATH setup
- Add transformers>=5.0.0 to requirements.txt
- Fix vllm_server.py import from relative to absolute path
- Remove hardcoded torch 2.10.0 requirement for vllm 0.16.0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@socket-security
Copy link

socket-security bot commented Mar 7, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedtransformers@​5.3.077100100100100

View full report

- Fix SETUPTOOLS_SCM_PRETEND_VERSION for local vllm-omni source builds
- Simplify omni image tag format to {VLLM_VERSION}-omni-{USER_TAG}
- Fix with_startup_metrics to pass args object correctly (func(args) not func(*args))
- Fix relative imports in vllm_omni_server.py to absolute imports
- Add --init-timeout (600s) and --stage-init-timeout (300s) args for torch.compile
- Add task_type field to OmniModelConfig override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant