-
Notifications
You must be signed in to change notification settings - Fork 189
[WIP] Update Dsv4 B300 configs #1656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
ee16194
e8ecb38
8df7b13
41630f2
bb079db
170000c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -3531,3 +3531,9 @@ | |||||
| - "The Rust frontend replaces only the Python serving/API layer (HTTP, tokenization, scheduling glue, detokenization) and spawns the same Python EngineCore, so GPU kernels/attention/MoE GEMM/KV cache are untouched" | ||||||
| - "A/B sweep (28 single-node points, 1k1k + 8k1k, TP 1/2/4) vs the Python-frontend baseline (run 26696260751): throughput Pareto-neutral (peak tok/s/GPU within <1.5%, frontiers coincident) and TPOT flat (+-0.5%); TTFT improves ~8% at 1k1k and ~22% at 8k1k (every point), the expected signature of lower frontend CPU latency before first token, scaling with input length" | ||||||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1634 | ||||||
|
|
||||||
| - config-keys: | ||||||
| - dsv4-fp4-b300-vllm | ||||||
| description: | ||||||
| - "Update B300 dsv4 image to nvfp4" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 WARNING: Misleading description — says "image" but only the Why it matters: The perf-changelog is the canonical record for tracing config changes. "image" in this file consistently refers to the Docker container image (e.g., Fix:
Suggested change
|
||||||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1656 | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DP-attn megamoe backend disabled
Medium Severity
With
DP_ATTENTION=true, the script no longer passes--moe-backend deep_gemm_mega_moe, butdsv4-fp4-b300-vllmstill schedules high-concurrencydp-attn/eppoints. That diverges from the prior B300 pareto recipe and fromdsv4_fp4_b300_vllm_mtp.sh/ B200 vLLM siblings, so those runs may not match the intended serving path.Reviewed by Cursor Bugbot for commit 41630f2. Configure here.