Skip to content

[AMD][MI35X] Qwen3.5-fp4 SGLang single-node benchmark#1680

Open
1am9trash wants to merge 4 commits into
mainfrom
qwen3.5-fp4-mi355x-perf-flags
Open

[AMD][MI35X] Qwen3.5-fp4 SGLang single-node benchmark#1680
1am9trash wants to merge 4 commits into
mainfrom
qwen3.5-fp4-mi355x-perf-flags

Conversation

@1am9trash
Copy link
Copy Markdown
Collaborator

@1am9trash 1am9trash commented Jun 8, 2026

Create branch from @yichiche branch.


Note

Low Risk
Benchmark and CI config only; no application auth or production serving paths. Sweep reduction avoids known OOM on high-concurrency MTP runs.

Overview
Updates Qwen3.5 FP4 MI355X single-node SGLang benchmarks (qwen3.5-fp4-mi355x-sglang and -mtp) to a newer ROCm image (lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260604) and aligns the launch recipes with AITER perf flags.

The base and MTP shell scripts now set SGLANG_USE_AITER_UNIFIED_ATTN=1 and pass --enable-aiter-allreduce-fusion, --max-running-requests $CONC, and --page-size 16 alongside the existing --attention-backend aiter. For MTP, the tp: 2 sweep caps concurrency at 128 instead of 256 to avoid OOM. A perf-changelog.yaml entry documents the image bump, backend/flag changes, and sweep trim.

Reviewed by Cursor Bugbot for commit c8804e2. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread benchmarks/single_node/fixed_seq_len/qwen3.5_fp4_mi355x.sh
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

Comment on lines 21 to +42
@@ -38,6 +39,8 @@ python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
--model-loader-extra-config '{"enable_multithread_load": true}' \
--watchdog-timeout 1200 \
--disable-radix-cache \
--enable-aiter-allreduce-fusion --max-running-requests $CONC \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contributing @1am9trash can u add this too the cookbook?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants