Skip to content

feat: add dedicated inference data parallel support#581

Open
vivekkalyan wants to merge 1 commit intomainfrom
feat/dedicated-unsloth-inference-dp
Open

feat: add dedicated inference data parallel support#581
vivekkalyan wants to merge 1 commit intomainfrom
feat/dedicated-unsloth-inference-dp

Conversation

@vivekkalyan
Copy link
Collaborator

@vivekkalyan vivekkalyan commented Feb 25, 2026

Summary

Enable dedicated Unsloth inference data parallelism across multiple inference GPUs.

This PR keeps the dedicated architecture as a single API frontend process with runtime LoRA reloading, and adds DP sizing/guardrails for multi-GPU inference.

What changed

  • Allow multi-GPU inference_gpu_ids in dedicated config validation.
  • Add dedicated validation for:
    • engine_args.data_parallel_size == len(inference_gpu_ids) (if provided)
    • engine_args.data_parallel_size_local == len(inference_gpu_ids) (if provided)
  • In dedicated vLLM subprocess startup:
    • default data_parallel_size and data_parallel_size_local to inference GPU count when >1
    • default distributed_executor_backend to "mp" when >1
    • reject api_server_count != 1
  • Add/expand unit tests for dedicated config validation contracts.

Testing

ART-E with 2 inference GPUs

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant