[Bug]: GPT-OSS-120B Eagle-v2 High concurrency perf drop

### Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

Only in B200 machine.

The regression is caused by https://github.com/vllm-project/vllm/pull/29624.

 I use commit:75eb302a as baseline.

In commit 75eb302a,

============ Serving Benchmark Result ============
Successful requests:                     2560
Benchmark duration (s):                  1597.51
Total input tokens:                      2621440
Total generated tokens:                  20971520
Request throughput (req/s):              1.60
Output token throughput (tok/s):         13127.61
Total Token throughput (tok/s):          14768.56
---------------Time to First Token----------------
Mean TTFT (ms):                          902.42
Median TTFT (ms):                        230.73
P99 TTFT (ms):                           6494.69
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          36.55
Median TPOT (ms):                        37.21
P99 TPOT (ms):                           51.62
---------------Inter-token Latency----------------
Mean ITL (ms):                           749.75
Median ITL (ms):                         780.64
P99 ITL (ms):                            1314.17
==================================================

In commit 75eb302a and revert https://github.com/vllm-project/vllm/pull/29624.
============ Serving Benchmark Result ============
Successful requests:                     2560
Benchmark duration (s):                  1268.58
Total input tokens:                      2621440
Total generated tokens:                  20971520
Request throughput (req/s):              2.02
Output token throughput (tok/s):         16531.44
Total Token throughput (tok/s):          18597.87
---------------Time to First Token----------------
Mean TTFT (ms):                          935.92
Median TTFT (ms):                        229.24
P99 TTFT (ms):                           6617.85
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          28.76
Median TPOT (ms):                        29.07
P99 TPOT (ms):                           43.97
---------------Inter-token Latency----------------
Mean ITL (ms):                           599.24
Median ITL (ms):                         577.93
P99 ITL (ms):                            1128.54
==================================================

https://github.com/vllm-project/vllm/pull/29624 Output throughput drop from 16531 to 13127 (20%). 

Repro command
export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8=1
server-side:
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 8087 --model openai/gpt-oss-120b --tokenizer openai/gpt-oss-120b --dtype auto --kv-cache-dtype fp8 --tensor-parallel-size 1 --pipeline-parallel-size 1 --data-parallel-size 1 --swap-space 16 --max-num-seqs 1024 --trust-remote-code --max-model-len 9226 --gpu-memory-utilization 0.9 --max-num-batched-tokens 8192 --no-enable-prefix-caching --async-scheduling --stream-interval 20 --compilation_config.pass_config.fuse_allreduce_rms true --compilation_config.pass_config.eliminate_noops true --compilation_config.max_cudagraph_capture_size 2048 --speculative_config.method eagle3 --speculative_config.model nvidia/gpt-oss-120b-Eagle3-v2 --speculative_config.num_speculative_tokens 3

client-side:
python3 benchmark_serving.py --backend vllm --host 0.0.0.0 --port 8087 --model openai/gpt-oss-120b --num-prompts 2560 --trust-remote-code --ignore-eos --max-concurrency 512 --random-input-len 1024 --random-output-len 8192 --random-range-ratio 1.0 --use-chat-template --dataset-name random --save-result --result-filename benchmark_serving_results.json

Note: benchmark_serving.py is from the following repo.
git clone https://github.com/kimbochen/bench_serving.git
pip install pandas datasets --break-system-packages

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: GPT-OSS-120B Eagle-v2 High concurrency perf drop #31014

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: GPT-OSS-120B Eagle-v2 High concurrency perf drop #31014

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions