Add Multi-GPU Support by LovelyBuggies · Pull Request #62 · OpenMLRL/CoMLRL

LovelyBuggies · 2026-02-16T02:59:31Z

No description provided.

LovelyBuggies · 2026-02-27T19:26:03Z

Version 1.3.7 Results

Tests on TLDR

python train_magrpo.py --config configs/magrpo_tldr_config.yaml --override magrpo.parallel_training=none wandb.project=homo_tldr wandb.name=magrpo_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_magrpo.py --config configs/magrpo_tldr_config.yaml --override magrpo.parallel_training=mp mgrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' wandb.project=homo_tldr wandb.name=magrpo_tldr_1p7b_1p7b_2gpu

It takes 12 H100 hours to train 8000 steps with about 47 Vram usage; or 12 2xH100 hours to train 8000 steps with about 27 Vram usage.

python train_maac.py --config configs/maac_tldr_config.yaml --override maac.parallel_training=None wandb.project=homo_tldr wandb.name=maac_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_maac.py --config configs/maac_tldr_config.yaml --override maac.parallel_training=mp maac.agent_devices='[\"cuda:0\",\"cuda:1\"]' maac.critic_devices='[\"cuda:0\"]' wandb.project=homo_tldr wandb.name=maac_tldr_1p7b_1p7b_2gpu

It takes both 20 hours for 1xH100 or 2xH100 to train 4000 steps with 71G and 51/28G vram usage, respectively (51 because GPU0 also holds CC).

python train_iac.py --config configs/iac_tldr_config.yaml --override iac.parallel_training=none wandb.project=homo_tldr wandb.name=iac_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_iac.py --config configs/iac_tldr_config.yaml --override iac.parallel_training=mp iac.agent_devices='[\"cuda:0\",\"cuda:1\"]' iac.critic_devices='[\"cuda:0\",\"cuda:1\"]' wandb.project=homo_tldr wandb.name=iac_tldr_1p7b_1p7b_2gpu

It takes both 20 hours for 1xH100 or 2xH100 to train 3500 steps with 80G and 50G vram usage, respectively (separate actor-critic).

Tests on CHE

python train_magrpo.py --config configs/magrpo_che_config.yaml --override magrpo.parallel_training=None agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=magrpo_che_3b_4b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_magrpo.py --config configs/magrpo_che_config.yaml --override magrpo.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=magrpo_che_3b_4b_2gpu

It takes about 5 H200 hours to train 6000 steps with 100 Vram usage.

python train_maac.py --config configs/maac_che_config.yaml --override maac.parallel_training=none agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None critics='[\"Qwen/Qwen2.5-Coder-3B\"]' wandb.project=hetero_che wandb.name=maac_che_3b_4b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_maac.py --config configs/maac_che_config.yaml --override maac.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.agent_devices='[\"cuda:0\",\"cuda:1\"]' maac.critic_devices='[\"cuda:0\"]' critic_model.name=None critics='[\"Qwen/Qwen2.5-Coder-3B\"]' wandb.project=hetero_che wandb.name=maac_che_3b_4b_2gpu

It takes about 8 H200 hours to train 6000 steps (yellow) with 118G Vram; 13 2xH100 hours to train 4000 steps with 80/60 Vram usage (80's GPU holds CC).

python train_iac.py --config configs/iac_che_config.yaml --override iac.use_separate_critic=false iac.parallel_training=none agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=iac_che_3b_4b_share_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_iac.py --config configs/iac_che_config.yaml --override iac.use_separate_critic=false iac.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' iac.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=iac_che_3b_4b_share_2gpu

It takes about 8 H200 hours to train 4000 steps with 140 Vram usage (pink); about 17 2xH100 hours to train shared AC 6000 steps, with each GPU using about 40-60 Vram.

Tests on Minecraft

Because of the difference in device seed generation. I only test it on 1xH200.

python house_build/train/train_magrpo.py --config house_build/configs/house_build_magrpo_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.parallel_training=None agent_model.name=None critics=None critic_model.name=None wandb.project=hetero-mc wandb.name='magrpo_house_3B_4B_1gpu'

It takes 8 hours to train on H200 for 2500 steps with 110 Vram usage.

python house_build/train/train_maac.py --config house_build/configs/house_build_maac_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.parallel_training=None agent_model.name=None critics='[\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None wandb.project=hetero-mc wandb.name='maac_house_3B_4B_1gpu'"

It takes 8 hours to train on H200 for 1500 steps with 140 Vram usage.

python house_build/train/train_maac.py --config house_build/configs/house_build_maac_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.parallel_training=None agent_model.name=None critics='[\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None wandb.project=hetero-mc wandb.name='maac_house_3B_4B_1gpu

It takes 8 hours to train on H200 for 2000 steps with 130 Vram usage (shared actor-critic).

LovelyBuggies added 21 commits February 15, 2026 11:26

Add torchrun DDP support for trainer pipelines

a4f7cbd

support dual parallel modes with torchrun and device schedulers

c48a92e

rename parallel_training key and remove legacy parallel wrappers

3e3f648

add user guide for parallel training modes

ff98e59

rename guide to multi-gpu training and clarify two modes

44f5331

add strict local rank to visible GPU validation for ddp

3e4816e

ud

e9f4b74

Update multi-gpu-training.md

c4b25a8

ud

6109f4b

ud

18bf506

ud

3cdfec8

fix

352076d

ud

2b2564f

ud

c68231d

udate docs

7e9966f

Update training-parallelization.md

21c1adb

ud

77c2f1c

ud

5891e14

ud

90fe247

Update iac.py

a0e9b77

clear redundant detach

72f442b

LovelyBuggies mentioned this pull request Feb 17, 2026

Add Multi-GPU Support OpenMLRL/LLM_Collab_Code_Generation#30

Merged

LovelyBuggies merged commit 53ccb68 into main Feb 17, 2026
4 checks passed

LovelyBuggies mentioned this pull request Feb 17, 2026

Add multi-gpu support OpenMLRL/LLM_Collab_Code_Completion#8

Merged

LovelyBuggies deleted the codex/trun branch February 17, 2026 15:48

This was referenced Feb 17, 2026

Add Multi-Gpu Support OpenMLRL/LLM_Collab_Writing#4

Merged

Add Multi-GPU Support OpenMLRL/LLM_Collab_Minecraft#4

Merged

Distributed Training #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Multi-GPU Support#62

Add Multi-GPU Support#62
LovelyBuggies merged 21 commits intomainfrom
codex/trun

LovelyBuggies commented Feb 16, 2026

Uh oh!

Uh oh!

LovelyBuggies commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LovelyBuggies commented Feb 16, 2026

Uh oh!

Uh oh!

LovelyBuggies commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Version 1.3.7 Results

Tests on TLDR

Tests on CHE

Tests on Minecraft

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LovelyBuggies commented Feb 27, 2026 •

edited

Loading