[AMD][MI35X] Qwen3.5-fp4 SGLang single-node benchmark by 1am9trash · Pull Request #1680 · SemiAnalysisAI/InferenceX

1am9trash · 2026-06-08T08:15:41Z

Note

Low Risk
Benchmark and CI config only; no application auth or production serving paths. Sweep reduction avoids known OOM on high-concurrency MTP runs.

Overview
Updates Qwen3.5 FP4 MI355X single-node SGLang benchmarks (qwen3.5-fp4-mi355x-sglang and -mtp) to a newer ROCm image (lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260604) and aligns the launch recipes with AITER perf flags.

The base and MTP shell scripts now set SGLANG_USE_AITER_UNIFIED_ATTN=1 and pass --enable-aiter-allreduce-fusion, --max-running-requests $CONC, and --page-size 16 alongside the existing --attention-backend aiter. For MTP, the tp: 2 sweep caps concurrency at 128 instead of 256 to avoid OOM. A perf-changelog.yaml entry documents the image bump, backend/flag changes, and sweep trim.

^{Reviewed by Cursor Bugbot for commit c8804e2. Bugbot is set up for automated code reviews on this repo. Configure here.}

…and page-size 16

github-actions · 2026-06-08T08:15:51Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-08T09:47:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27124776305
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27124776305

github-actions · 2026-06-08T11:42:20Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27130906063
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27130906063

functionstackx · 2026-06-08T15:51:12Z

@@ -38,6 +39,8 @@ python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
 --model-loader-extra-config '{"enable_multithread_load": true}' \
 --watchdog-timeout 1200  \
 --disable-radix-cache \
+--enable-aiter-allreduce-fusion --max-running-requests $CONC \


thanks for the contributing @1am9trash can u add this too the cookbook?

yichiche and others added 2 commits June 8, 2026 07:48

[AMD][MI355X] Qwen3.5-fp4: add aiter unified attn, allreduce fusion, …

08809b9

…and page-size 16

Update config

60da2b8

1am9trash requested a review from a team June 8, 2026 08:15

1am9trash requested review from billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 8, 2026 08:15

github-project-automation Bot added this to InferenceMAX Board Jun 8, 2026

Update change log

59ba6f5

1am9trash added AMD full-sweep-enabled labels Jun 8, 2026

claude Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread benchmarks/single_node/fixed_seq_len/qwen3.5_fp4_mi355x.sh

Update change log and remove OOM config

c8804e2

functionstackx reviewed Jun 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][MI35X] Qwen3.5-fp4 SGLang single-node benchmark#1680

[AMD][MI35X] Qwen3.5-fp4 SGLang single-node benchmark#1680
1am9trash wants to merge 4 commits into
mainfrom
qwen3.5-fp4-mi355x-perf-flags

1am9trash commented Jun 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

functionstackx Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

1am9trash commented Jun 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

functionstackx Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1am9trash commented Jun 8, 2026 •

edited by cursor Bot

Loading