Skip to content

疑似z-image-turbo训练采样步数bug #1251

@yangguoquan001

Description

@yangguoquan001

使用z-image-turbo lora训练后,8步推理出来的结果非常模糊,要到50步才勉强可看。下面是同样的prompt,同样的seed训练前后对比图:

Image Image

下面是分阶段训练的脚本:

accelerate launch DiffSynth-Studio/examples/z_image/model_training/train.py
--dataset_base_path autodl-tmp/dataset
--dataset_metadata_path autodl-tmp/dataset/metadata.csv
--max_pixels 1048576
--dataset_repeat 1
--model_paths '[
[
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00003-of-00003.safetensors"
],
[
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/text_encoder/model-00003-of-00003.safetensors"
],
"autodl-tmp/Z-Image-Turbo/vae/diffusion_pytorch_model.safetensors"
]'
--learning_rate 1e-4
--num_epochs 5
--remove_prefix_in_ckpt "pipe.dit."
--output_path "autodl-tmp/z-image-turbo-cache"
--lora_base_model "dit"
--lora_target_modules "to_q,to_k,to_v,to_out.0,w1,w2,w3"
--lora_rank 32
--use_gradient_checkpointing
--dataset_num_workers 8
--task "sft:data_process"

accelerate launch DiffSynth-Studio/examples/z_image/model_training/train.py
--dataset_base_path autodl-tmp/z-image-turbo-cache
--max_pixels 1048576
--dataset_repeat 1
--model_paths '[
[
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00001-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00002-of-00003.safetensors",
"autodl-tmp/Z-Image-Turbo/transformer/diffusion_pytorch_model-00003-of-00003.safetensors"
]
]'
--learning_rate 1e-4
--num_epochs 5
--remove_prefix_in_ckpt "pipe.dit."
--output_path "autodl-tmp/z-image-turbo-lora"
--lora_base_model "dit"
--lora_target_modules "to_q,to_k,to_v,to_out.0,w1,w2,w3"
--lora_rank 32
--use_gradient_checkpointing
--dataset_num_workers 8
--task "sft:train"

我怀疑底层脚本训练时,仍然是按照很大步数训练的,才导致这个问题,这样就背离该模型的初衷了。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions