使用sglang推理加速Qwen2.5-vl模型报错

**Describe the bug**
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
我的代码是：
from swift.llm import SglangEngine
 engine = SglangEngine(model_path)

报错：
[INFO:swift] Setting ROOT_IMAGE_DIR: None. You can adjust this hyperparameter through the environment variable: `ROOT_IMAGE_DIR`.
[INFO:swift] Setting QWENVL_BBOX_FORMAT: legacy. You can adjust this hyperparameter through the environment variable: `QWENVL_BBOX_FORMAT`.
INFO 12-18 14:04:04 __init__.py:207] Automatically detected platform cuda.
INFO 12-18 14:04:04 __init__.py:207] Automatically detected platform cuda.
WARNING:sglang.srt.server_args:Attention backend not explicitly specified. Use flashinfer backend by default.
INFO 12-18 14:04:07 __init__.py:207] Automatically detected platform cuda.
INFO 12-18 14:04:07 __init__.py:207] Automatically detected platform cuda.
INFO 12-18 14:04:07 __init__.py:207] Automatically detected platform cuda.
INFO 12-18 14:04:07 __init__.py:207] Automatically detected platform cuda.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00<00:00,  1.87it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  2.63it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  2.48it/s]

Capturing batches (bs=1 avail_mem=3.94 GB): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 12.86it/s]
Running FULL PARSE ...
  0%|                                                                                                                                                                                                 | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/chenboyang/Document/OCRFlux-main/OCRFlux-main-new/ocrflux/all_swift.py", line 164, in parse
    resp_list = engine.infer(infer_reqs, request_config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/sglang_engine.py", line 183, in infer
    return super().infer(infer_requests, request_config, metrics, template=template, use_tqdm=use_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/infer_engine.py", line 191, in infer
    return self._batch_infer_stream(tasks, request_config.stream, use_tqdm, metrics)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/infer_engine.py", line 147, in _batch_infer_stream
    return loop.run_until_complete(self.batch_run(new_tasks))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/infer_engine.py", line 115, in batch_run
    return await asyncio.gather(*tasks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/infer_engine.py", line 132, in _new_run
    res = await task
          ^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/sglang_engine.py", line 219, in infer_async
    return await self._infer_full_async(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/conda_envs/sft_sglang/lib/python3.11/site-packages/swift/llm/infer/infer_engine/sglang_engine.py", line 237, in _infer_full_async
    output = await self.engine.async_generate(**engine_inputs, sampling_params=generation_config)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Engine.async_generate() got an unexpected keyword argument 'images'
  0%|                                                                                                                                                                                                 | 0/2 [00:00<?, ?it/s]
None
>>> FULL PARSE DONE
/data/conda_envs/sft_sglang/lib/python3.11/multiprocessing/resource_tracker.py:123: UserWarning: resource_tracker: process died unexpectedly, relaunching.  Some resources might leak.
  warnings.warn('resource_tracker: process died unexpectedly, '
Traceback (most recent call last):
  File "/data/conda_envs/sft_sglang/lib/python3.11/multiprocessing/resource_tracker.py", line 239, in main
    cache[rtype].remove(name)
KeyError: '/mp-340rzebv'

**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

在3090上，
sglang  0.5.6.post2； triton==2.1.0，nvidia-cudnn-cu12 9.16.0.29，torch   2.9.1，ms_swift  3.11.1

**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用sglang推理加速Qwen2.5-vl模型报错 #7113

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

使用sglang推理加速Qwen2.5-vl模型报错 #7113

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions