-
Notifications
You must be signed in to change notification settings - Fork 189
Use official TRT-LLM image (1.3.0rc15.post1) for DSv4 B300 TRT (non-MTP + MTP) #1636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4bcefb7
6b7558c
bd3c94c
f441f9f
4bc5592
1b0afeb
14a1bb3
242ab88
e23a541
6118a76
5adfeb3
ad529fb
c2381b7
b09619e
65bddbb
285b79a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3387,6 +3387,13 @@ | |
| - "Add MTP speculative-decoding sibling for dsv4-fp4-mi355x-vllm (model: deepseek-ai/DeepSeek-V4-Pro) on vllm/vllm-openai-rocm:v0.22.0, per vllm-project/vllm#43385" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1630 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b300-trt | ||
| - dsv4-fp4-b300-trt-mtp | ||
| description: | ||
| - "Point the B300 TensorRT-LLM DeepSeek-V4-Pro configs (non-MTP dsv4-fp4-b300-trt and MTP dsv4-fp4-b300-trt-mtp) at the official NVIDIA release image nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc15.post1, replacing the custom ghcr.io semianalysis feat/deepseek_v4 builds (2dd03e6 and 9aa3715 respectively), to evaluate the official RC for DeepSeek-V4-Pro. Also drops the TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 launcher workaround (specific to the custom build) so the official image runs with its native behavior. B200 TRT is unchanged (stays on feat-deepseek_v4-9aa3715)." | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicate PR changelog entriesLow Severity This commit adds two separate Additional Locations (1)Reviewed by Cursor Bugbot for commit c2381b7. Configure here.
cursor[bot] marked this conversation as resolved.
Comment on lines
+3390
to
+3395
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Duplicate and contradictory perf-changelog entries for PR #1636: two entries cover the same config-keys ( Extended reasoning...What the bug is
Why entry A is wrongThe same PR diff adds the export to both launcher scripts with default # benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt.sh:62-65
export TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE="${TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE:-0}"
echo "TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE: $TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE"
# benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh:61-64 (identical)Because the parameter expansion ImpactDoc-only — the runtime behavior is unambiguously the one described in Entry B (the workaround is on). But the changelog is the project's audit trail: any human or tool that consumes it to ask "what did PR #1636 ship for FixDelete Entry A (perf-changelog.yaml:3389-3395). Entry B already covers the same configs with the accurate description. Independently flagged by Cursor Bugbot in two comments on this PR: Duplicate PR changelog entries (Low) and Changelog contradicts launcher workaround (Medium). Step-by-step proof
|
||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-mi355x-sglang-mtp | ||
| description: | ||
|
|
@@ -3445,6 +3452,13 @@ | |
| - "Add 1k1k/8k1k minimax recipe set under benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m2.5/" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1641 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b300-trt | ||
| - dsv4-fp4-b300-trt-mtp | ||
| description: | ||
| - "Switch B300 DSv4 TRT (non-MTP + MTP) to official rc15.post1 image with TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSE=0 to test whether DPA crash is same SWA-scratch bug" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1636 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b200-vllm | ||
| description: | ||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SWA default contradicts disable comment
Medium Severity
The new launcher comments say DSv4 SWA scratch reuse is disabled to isolate the rc15.post1 DPA crash, but
TRTLLM_DSV4_ENABLE_SWA_SCRATCH_REUSEdefaults to1when unset, which keeps scratch reuse enabled. Sweep runs without an override therefore do not match the stated experiment.Additional Locations (1)
benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_trt_mtp.sh#L60-L63Reviewed by Cursor Bugbot for commit 65bddbb. Configure here.