Skip to content

feat: upgrade vLLM to 0.16.0#584

Closed
vivekkalyan wants to merge 1 commit intomainfrom
feat/vllm-0.16.0
Closed

feat: upgrade vLLM to 0.16.0#584
vivekkalyan wants to merge 1 commit intomainfrom
feat/vllm-0.16.0

Conversation

@vivekkalyan
Copy link
Collaborator

@vivekkalyan vivekkalyan commented Feb 27, 2026

Summary

Upgrades vllm from 0.15.1 to 0.16.0

Testing Scope

  • Models evaluated:
    • OpenPipe/Qwen3-14B-Instruct
    • Qwen/Qwen3-30B-A3B-Instruct-2507
  • Modes evaluated:
    • inference-only
    • strict replay
    • ART-E
  • Experiment discipline:
    • H200 single-GPU
    • one fresh VM/cluster per run

Inference-only (single GPU, c=8)

  • 14B:
    • throughput: 621.88 -> 620.69 tok/s (-0.19%)
    • latency avg: 1.2766s -> 1.2775s (+0.07%)
  • 30B-A3B:
    • 0.16.0: 620.38 tok/s, 1.1545s latency avg
    • 0.15.1: 618.06 tok/s, 1.1585s latency avg
    • +0.38% throughput, -0.34% latency for 0.16.0

ART-E

Task-quality metrics stayed stable across both models; 30B-A3B ART-E showed slight latency/throughput improvement on 0.16.0.

14B:

  • latency_mean: 0.384353 -> 0.384383 (+0.0076%)
  • latency_p95: 0.877676 -> 0.877771 (+0.0108%)
  • completion_tokens_per_sec: 335.3655 -> 335.0151 (-0.10%)

30B-A3B:

  • latency_mean: 0.306657 -> 0.300918 (-1.87%)
  • latency_p95: 0.740726 -> 0.733300 (-1.00%)
  • completion_tokens_per_sec: 428.8369 -> 442.5670 (+3.20%)

Compatibility notes

  • Upstream protocol paths changed around reasoning_content; thinking-model flows should be exercised for ART paths that still emit it.
  • Tinker renderer paths are Tinker API-specific and not the primary local vLLM path.

@vivekkalyan vivekkalyan changed the title feat(vllm): upgrade to 0.16.0 with single-GPU validation feat: upgrade vLLM to 0.16.0 Feb 27, 2026
@vivekkalyan
Copy link
Collaborator Author

closed by #610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant