Qwen3-235B-A22B GRPO OOM

**Describe the bug**
使用Qwen3-235B-A22B 跑GRPO,OOM，大佬看下有什么建议？


**Your hardware and system info**
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
128卡 H20 
cuda版本：CUDA Version: 12.9
torch版本：2.8.0+cu129
训练参数
`export PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
torchrun $DISTRIBUTED_ARGS ${SCRIPT_DIR}/swift/cli/rlhf.py \
    --rlhf_type grpo \
    --model ${MODEL_NAME_OR_PATH} \
    --external_plugins ${RUN_DIR}/custom_reward.py \
    --reward_funcs xxx \
    --use_vllm true \
    --vllm_mode colocate \
    --vllm_gpu_memory_utilization 0.7 \
    --vllm_max_model_len 10240 \
    --train_type lora \
    --lora_rank 128 \
    --torch_dtype bfloat16 \
    --dataset ${DATASET_PATH} \
    --max_completion_length 2048 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-6 \
    --gradient_accumulation_steps 1 \
    --eval_steps 200 \
    --save_steps 200 \
    --report_to wandb \
    --sleep_level 1 \
    --offload_optimizer true \
    --offload_model true \
    --vllm_tensor_parallel_size 8 \
    --vllm_enable_expert_parallel true \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 8192 \
    --async_generate false \
    --output_dir ${OUTPUT_PATH} \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 8 \
    --temperature 0.9 \
    --deepspeed zero3_offload \
    --move_model_batches 10 \
    --beta 0.0 \
    --log_completions false \
    --importance_sampling_level sequence
`

**Additional context**
Add any other context about the problem here(在这里补充其他信息)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3-235B-A22B GRPO OOM #7125

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-235B-A22B GRPO OOM #7125

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions