Skip to content

Add Multi-GPU Support#62

Merged
LovelyBuggies merged 21 commits intomainfrom
codex/trun
Feb 17, 2026
Merged

Add Multi-GPU Support#62
LovelyBuggies merged 21 commits intomainfrom
codex/trun

Conversation

@LovelyBuggies
Copy link
Member

No description provided.

@LovelyBuggies
Copy link
Member Author

LovelyBuggies commented Feb 27, 2026

Version 1.3.7 Results

Tests on TLDR

python train_magrpo.py --config configs/magrpo_tldr_config.yaml --override magrpo.parallel_training=none wandb.project=homo_tldr wandb.name=magrpo_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_magrpo.py --config configs/magrpo_tldr_config.yaml --override magrpo.parallel_training=mp mgrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' wandb.project=homo_tldr wandb.name=magrpo_tldr_1p7b_1p7b_2gpu
image

It takes 12 H100 hours to train 8000 steps with about 47 Vram usage; or 12 2xH100 hours to train 8000 steps with about 27 Vram usage.

python train_maac.py --config configs/maac_tldr_config.yaml --override maac.parallel_training=None wandb.project=homo_tldr wandb.name=maac_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_maac.py --config configs/maac_tldr_config.yaml --override maac.parallel_training=mp maac.agent_devices='[\"cuda:0\",\"cuda:1\"]' maac.critic_devices='[\"cuda:0\"]' wandb.project=homo_tldr wandb.name=maac_tldr_1p7b_1p7b_2gpu
image

It takes both 20 hours for 1xH100 or 2xH100 to train 4000 steps with 71G and 51/28G vram usage, respectively (51 because GPU0 also holds CC).

python train_iac.py --config configs/iac_tldr_config.yaml --override iac.parallel_training=none wandb.project=homo_tldr wandb.name=iac_tldr_1p7b_1p7b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_iac.py --config configs/iac_tldr_config.yaml --override iac.parallel_training=mp iac.agent_devices='[\"cuda:0\",\"cuda:1\"]' iac.critic_devices='[\"cuda:0\",\"cuda:1\"]' wandb.project=homo_tldr wandb.name=iac_tldr_1p7b_1p7b_2gpu
image

It takes both 20 hours for 1xH100 or 2xH100 to train 3500 steps with 80G and 50G vram usage, respectively (separate actor-critic).

Tests on CHE

python train_magrpo.py --config configs/magrpo_che_config.yaml --override magrpo.parallel_training=None agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=magrpo_che_3b_4b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_magrpo.py --config configs/magrpo_che_config.yaml --override magrpo.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=magrpo_che_3b_4b_2gpu
image

It takes about 5 H200 hours to train 6000 steps with 100 Vram usage.

python train_maac.py --config configs/maac_che_config.yaml --override maac.parallel_training=none agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None critics='[\"Qwen/Qwen2.5-Coder-3B\"]' wandb.project=hetero_che wandb.name=maac_che_3b_4b_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_maac.py --config configs/maac_che_config.yaml --override maac.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.agent_devices='[\"cuda:0\",\"cuda:1\"]' maac.critic_devices='[\"cuda:0\"]' critic_model.name=None critics='[\"Qwen/Qwen2.5-Coder-3B\"]' wandb.project=hetero_che wandb.name=maac_che_3b_4b_2gpu
image

It takes about 8 H200 hours to train 6000 steps (yellow) with 118G Vram; 13 2xH100 hours to train 4000 steps with 80/60 Vram usage (80's GPU holds CC).

python train_iac.py --config configs/iac_che_config.yaml --override iac.use_separate_critic=false iac.parallel_training=none agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=iac_che_3b_4b_share_1gpu

CUDA_VISIBLE_DEVICES=0,1 python train_iac.py --config configs/iac_che_config.yaml --override iac.use_separate_critic=false iac.parallel_training=mp agent_model.name=None agents='[\"Qwen/Qwen2.5-Coder-3B\",\"Qwen/Qwen3-4B-Instruct-2507\"]' iac.agent_devices='[\"cuda:0\",\"cuda:1\"]' critic_model.name=None critics=None wandb.project=hetero_che wandb.name=iac_che_3b_4b_share_2gpu
image

It takes about 8 H200 hours to train 4000 steps with 140 Vram usage (pink); about 17 2xH100 hours to train shared AC 6000 steps, with each GPU using about 40-60 Vram.

Tests on Minecraft

Because of the difference in device seed generation. I only test it on 1xH200.

python house_build/train/train_magrpo.py --config house_build/configs/house_build_magrpo_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' magrpo.parallel_training=None agent_model.name=None critics=None critic_model.name=None wandb.project=hetero-mc wandb.name='magrpo_house_3B_4B_1gpu'

It takes 8 hours to train on H200 for 2500 steps with 110 Vram usage.

python house_build/train/train_maac.py --config house_build/configs/house_build_maac_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.parallel_training=None agent_model.name=None critics='[\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None wandb.project=hetero-mc wandb.name='maac_house_3B_4B_1gpu'"
image

It takes 8 hours to train on H200 for 1500 steps with 140 Vram usage.

python house_build/train/train_maac.py --config house_build/configs/house_build_maac_config.yaml --override agents='[\"Qwen/Qwen2.5-3B-Instruct\",\"Qwen/Qwen3-4B-Instruct-2507\"]' maac.parallel_training=None agent_model.name=None critics='[\"Qwen/Qwen3-4B-Instruct-2507\"]' critic_model.name=None wandb.project=hetero-mc wandb.name='maac_house_3B_4B_1gpu

It takes 8 hours to train on H200 for 2000 steps with 130 Vram usage (shared actor-critic).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant