Skip to content

build: Upgrade vLLM to 0.17.0#610

Open
vivekkalyan wants to merge 1 commit intomainfrom
build/upgrade-vllm-0.17.0
Open

build: Upgrade vLLM to 0.17.0#610
vivekkalyan wants to merge 1 commit intomainfrom
build/upgrade-vllm-0.17.0

Conversation

@vivekkalyan
Copy link
Collaborator

@vivekkalyan vivekkalyan commented Mar 11, 2026

Summary

  • upgrade vllm from 0.15.1 to 0.17.0

Compatibility evidence

Upstream v0.17.0 still exposes the patch points ART uses (ChatCompletionRequest, DeltaMessage, ToolParserManager, LoRA request path, and OpenAI CLI entrypoints), so no ART integration code changes were required for this bump.

One upstream behavior change to keep in mind: reasoning_content was removed from the vLLM chat-completion protocol in favor of reasoning.

Local sanity

  • uv run pytest -q tests/unit/test_auto_trajectory.py tests/unit/test_yield_trajectory.py
  • result: 3 passed

Sky GPU contract run

Fresh Sky cluster / VM per run, then torn down after validation:

  • cluster: art-vllm017-contract-0310
  • infra: kubernetes/cks-wb3
  • GPU: H200:1
  • resolved runtime: torch 2.10.0+cu128, vllm 0.17.0
  • command: uv run --extra backend pytest -q -rs tests/unit/test_vllm_patches_contract.py tests/unit/test_dedicated_server.py tests/integration/test_vllm_contract.py
  • result: 7 passed, 34 warnings in 99.46s (0:01:39)

ART-E

ART-E benchmark using the same harness as the prior vLLM upgrade comparison.

OpenPipe/Qwen3-14B-Instruct

  • 0.15.1 -> 0.17.0
  • latency mean: 0.38435s -> 0.38594s (+0.41%)
  • latency p95: 0.87768s -> 0.89198s (+1.63%)
  • completion tok/s: 335.37 -> 332.61 (-0.82%)
  • answer correct: 0.15 -> 0.15
  • source correct: 1.00 -> 1.00
  • failed format validation: 0.00 -> 0.00
  • avg turns: 3.0 -> 3.0

Qwen/Qwen3-30B-A3B-Instruct-2507

  • 0.15.1 -> 0.17.0
  • latency mean: 0.30666s -> 0.30330s (-1.09%)
  • latency p95: 0.74073s -> 0.73615s (-0.62%)
  • completion tok/s: 428.84 -> 449.28 (+4.77%)
  • answer correct: 0.15 -> 0.15
  • source correct: 0.90 -> 0.95
  • failed format validation: 0.10 -> 0.05
  • avg turns: 2.8 -> 2.9

@vivekkalyan vivekkalyan marked this pull request as ready for review March 11, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant