Open
Conversation
- Add VLLM_PIP_VERSION build arg to handle versions with '+' invalid in Docker tags - Add --vllm-pip-version flag to build_and_upload_image.sh - Fix Dockerfile inline Python heredoc for nvidia lib path detection - Switch Docker entrypoint to entrypoint.sh for LD_LIBRARY_PATH setup - Add transformers>=5.0.0 to requirements.txt - Fix vllm_server.py import from relative to absolute path - Remove hardcoded torch 2.10.0 requirement for vllm 0.16.0 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
- Fix SETUPTOOLS_SCM_PRETEND_VERSION for local vllm-omni source builds
- Simplify omni image tag format to {VLLM_VERSION}-omni-{USER_TAG}
- Fix with_startup_metrics to pass args object correctly (func(args) not func(*args))
- Fix relative imports in vllm_omni_server.py to absolute imports
- Add --init-timeout (600s) and --stage-init-timeout (300s) args for torch.compile
- Add task_type field to OmniModelConfig override
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
VLLM_PIP_VERSIONbuild arg to handle vllm versions containing+(invalid in Docker image tags)--vllm-pip-versionCLI flag tobuild_and_upload_image.shNoneguard for__file__)/bin/envtoentrypoint.shwhich prepends pip-installed NVIDIA CUDA lib paths toLD_LIBRARY_PATHtransformers>=5.0.0torequirements.txtvllm_server.pyimport from relative to absolute pathtorch==2.10.0requirement for vllm0.16.0Test plan
build_and_upload_image.shwith--vllm-pip-versionflagLD_LIBRARY_PATHwith pip-installed NVIDIA libs🤖 Generated with Claude Code
Greptile Summary
This PR delivers a set of build pipeline and runtime fixes for the
vllm-omniDocker image: it introduces aVLLM_PIP_VERSIONbuild arg to decouple the pip install version (which may contain+) from the Docker image tag, replaces the bare/bin/enventrypoint with a dedicatedentrypoint.shthat correctly injects pip-installed NVIDIA CUDA lib paths intoLD_LIBRARY_PATH, and fixes a Python heredoc in the Dockerfile to guard against__file__ is None. On the Python side, relative imports invllm_omni_server.pyandvllm_server.pyare corrected to absolute imports (required since these modules run as__main__), a latent bug inwith_startup_metricswhere aNamespacewas incorrectly unpacked with*argsis fixed, and the hardcodedtorch==2.10.0constraint for vllm0.16.0is removed.Key changes:
Dockerfile.vllm: AddsVLLM_PIP_VERSIONARG, fixes Python heredocNoneguard, switchesENTRYPOINTtoentrypoint.shentrypoint.sh(new): Prepends pip-installed NVIDIA lib paths toLD_LIBRARY_PATHbeforeexec "$@"build_and_upload_image.sh: Adds--vllm-pip-versionflag; removesVLLM_OMNI_VERSIONfrom image tag (startup_telemetry.py: Fixesfunc(*args)→func(args)for the non-iterableNamespaceargumentvllm_omni_server.py/vllm_server.py: Corrects relative → absolute imports; adds--init-timeoutand--stage-init-timeoutargsrequirements.txt: Pinstransformers>=5.0.0Confidence Score: 4/5
func(*args)→func(args)fix corrects a TypeError that would have causedwith_startup_metricsto fail whenever called. The import path corrections are necessary for__main__execution. The entrypoint.sh is clean and correct. The one open question is the removal ofVLLM_OMNI_VERSIONfrom the vllm_omni image tag, which could cause tag collisions if the same VLLM version is built with different omni versions under the same USER_TAG — this is a process concern rather than a code defect.build_and_upload_image.sharound the updated image tag format (line 258), since removingVLLM_OMNI_VERSIONfrom the tag may cause images to overwrite each other in certain CI workflows.Important Files Changed
func(*args)tried to unpack a non-iterableNamespaceobject; changed tofunc(args)which correctly passes the Namespace as a single positional argument.Sequence Diagram
sequenceDiagram participant Docker as Docker Runtime participant EP as entrypoint.sh participant Env as LD_LIBRARY_PATH participant Server as vllm_omni_server.py participant Metrics as with_startup_metrics() participant OmniSrv as omni_run_server(args) Docker->>EP: exec entrypoint.sh [CMD args] EP->>Env: prepend pip NVIDIA lib paths\n(from NVIDIA_PIP_LIB_PATH_FILE) EP->>Server: exec python vllm_omni_server.py [args] Server->>Server: parse_args() → Namespace Server->>Metrics: with_startup_metrics(omni_run_server, args, t0) alt ENABLE_STARTUP_METRICS=true Metrics->>Metrics: init VLLMStartupMetrics Metrics->>Metrics: record_python_init() Metrics->>Metrics: start health_check_loop thread end Metrics->>OmniSrv: await func(args) ← fixed from func(*args) OmniSrv-->>Metrics: server running Metrics-->>Server: returnComments Outside Diff (2)
model-engine/model_engine_server/inference/vllm/Dockerfile.vllm, line 130-139 (link)The PR title is "fix: vllm-omni build pipeline improvements" and the PR description explicitly calls out switching the entrypoint from
/bin/envtoentrypoint.shto correctly prepend pip-installed NVIDIA CUDA lib paths toLD_LIBRARY_PATH. However, thevllm_omnistage was not updated—it still usesENTRYPOINT ["/bin/env"]and never copiesentrypoint.shinto the image.The
vllm_basestage writesNVIDIA_PIP_LIB_PATH_FILE=/tmp/nvidia_lib_path.txt, but this environment variable is only acted upon byentrypoint.sh. Sincevllm_omnidoesn't useentrypoint.sh, theLD_LIBRARY_PATHfix will have no effect for this build target, defeating the stated purpose of the PR.The
vllm_omnistage needs the same treatment as thevllmstage:model-engine/model_engine_server/inference/vllm/build_and_upload_image.sh, line 256-258 (link)Image tag collision risk after removing VLLM_OMNI_VERSION
Removing
VLLM_OMNI_VERSIONfrom the image tag means two builds with the sameVLLM_VERSIONandUSER_TAGbut differentVLLM_OMNI_VERSIONvalues will produce the exact same tag, silently overwriting each other. For example:VLLM_VERSION=0.7.0,VLLM_OMNI_VERSION=0.16.0,USER_TAG=prod→0.7.0-omni-prodVLLM_VERSION=0.7.0,VLLM_OMNI_VERSION=0.17.0,USER_TAG=prod→0.7.0-omni-prodThese produce identical tags even though the images contain different
vllm-omniversions. IfUSER_TAGis expected to encode the omni version (e.g., always passing--user-tag v0.16.0), this is fine — but worth making explicit in the script's usage comments or documentation. Otherwise this creates ambiguity when debugging or rolling back images.Last reviewed commit: fdbe2d8