Skip to content

Commit 10bff22

Browse files
Glaceon-Hyyakaitsuki-iiqzzz95
authored
supports Qwen-Image (#130)
* supports Qwen-Image * update docs --------- Co-authored-by: zhuguoxuan.zgx <zhuguoxuan.zgx@alibaba-inc.com> Co-authored-by: dujiancong.djc <dujiancong.djc@alibaba-inc.com>
1 parent f1b70c2 commit 10bff22

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+913957
-176
lines changed

README.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ and offloading strategies, enabling loading of larger diffusion models (e.g., Fl
2323

2424
## News
2525

26+
- **[v0.4.1](https://github.com/modelscope/DiffSynth-Engine/releases/tag/v0.4.1)** | **August 4, 2025**: 🔥Supports [Qwen-Image](https://www.modelscope.cn/models/Qwen/Qwen-Image), an image generation model excels at complex text rendering and creating images in a wide range of artistic styles.
2627
- **[v0.4.0](https://github.com/modelscope/DiffSynth-Engine/releases/tag/v0.4.0)** | **August 1, 2025**:
2728
- 🔥Supports [Wan2.2](https://modelscope.cn/collections/tongyiwanxiang-22--shipinshengcheng-2bb5b1adef2840) video generation model
2829
- ⚠️[**Breaking Change**] Improved `from_pretrained` method pipeline initialization
@@ -49,21 +50,24 @@ pip3 install -e .
4950
### Usage
5051
Text to image
5152
```python
52-
from diffsynth_engine import fetch_model, FluxImagePipeline
53+
from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig
5354

5455
model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
55-
pipe = FluxImagePipeline.from_pretrained(model_path, device='cuda:0')
56+
57+
config = FluxPipelineConfig.basic_config(model_path=model_path, device="cuda:0")
58+
pipe = FluxImagePipeline.from_pretrained(config)
5659
image = pipe(prompt="a cat")
5760
image.save("image.png")
5861
```
5962
Text to image with LoRA
6063
```python
61-
from diffsynth_engine import fetch_model, FluxImagePipeline
64+
from diffsynth_engine import fetch_model, FluxImagePipeline, FluxPipelineConfig
6265

6366
model_path = fetch_model("muse/flux-with-vae", path="flux1-dev-with-vae.safetensors")
6467
lora_path = fetch_model("DonRat/MAJICFLUS_SuperChinesestyleheongsam", path="麦橘超国风旗袍.safetensors")
6568

66-
pipe = FluxImagePipeline.from_pretrained(model_path, device='cuda:0')
69+
config = FluxPipelineConfig.basic_config(model_path=model_path, device="cuda:0")
70+
pipe = FluxImagePipeline.from_pretrained(config)
6771
pipe.load_lora(path=lora_path, scale=1.0)
6872
image = pipe(prompt="a girl, qipao")
6973
image.save("image.png")

diffsynth_engine/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
SDXLPipelineConfig,
44
FluxPipelineConfig,
55
WanPipelineConfig,
6+
QwenImagePipelineConfig,
67
ControlNetParams,
78
ControlType,
89
)
@@ -11,6 +12,7 @@
1112
SDXLImagePipeline,
1213
SDImagePipeline,
1314
WanVideoPipeline,
15+
QwenImagePipeline,
1416
)
1517
from .models.flux import FluxControlNet, FluxIPAdapter, FluxRedux
1618
from .models.sd import SDControlNet
@@ -31,6 +33,7 @@
3133
"FluxPipelineConfig",
3234
"WanPipelineConfig",
3335
"FluxImagePipeline",
36+
"QwenImagePipelineConfig",
3437
"FluxControlNet",
3538
"FluxIPAdapter",
3639
"FluxRedux",
@@ -39,6 +42,7 @@
3942
"SDXLImagePipeline",
4043
"SDImagePipeline",
4144
"WanVideoPipeline",
45+
"QwenImagePipeline",
4246
"FluxInpaintingTool",
4347
"FluxOutpaintingTool",
4448
"FluxIPAdapterRefTool",
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"hidden_size": 3584,
3+
"intermediate_size": 18944,
4+
"num_hidden_layers": 28,
5+
"num_attention_heads": 28,
6+
"num_key_value_heads": 4,
7+
"mrope_section": [
8+
16,
9+
24,
10+
24
11+
],
12+
"rms_norm_eps": 1e-6,
13+
"use_cache": true,
14+
"use_sliding_window": false,
15+
"sliding_window": 32768,
16+
"max_window_layers": 28,
17+
"vocab_size": 152064,
18+
"pad_token_id": 151643,
19+
"im_start_token_id": 151644,
20+
"im_end_token_id": 151645,
21+
"vision_start_token_id": 151652,
22+
"vision_end_token_id": 151653,
23+
"image_token_id": 151655,
24+
"video_token_id": 151656
25+
}
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
{
2+
"in_channels": 3,
3+
"hidden_size": 1280,
4+
"intermediate_size": 3420,
5+
"out_hidden_size": 3584,
6+
"num_heads": 16,
7+
"depth": 32,
8+
"patch_size": 14,
9+
"temporal_patch_size": 2,
10+
"spatial_merge_size": 2,
11+
"tokens_per_second": 2,
12+
"window_size": 112,
13+
"fullatt_block_indexes": [
14+
7,
15+
15,
16+
23,
17+
31
18+
]
19+
}
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
{
2+
"in_channels": 3,
3+
"out_channels": 3,
4+
"encoder_dim": 96,
5+
"decoder_dim": 96,
6+
"z_dim": 16,
7+
"dim_mult": [1, 2, 4, 4],
8+
"num_res_blocks": 2,
9+
"temperal_downsample": [false, true, true],
10+
"dropout": 0.0,
11+
"patch_size": 1,
12+
"mean": [
13+
-0.7571,
14+
-0.7089,
15+
-0.9113,
16+
0.1075,
17+
-0.1745,
18+
0.9653,
19+
-0.1517,
20+
1.5508,
21+
0.4134,
22+
-0.0715,
23+
0.5517,
24+
-0.3632,
25+
-0.1922,
26+
-0.9497,
27+
0.2503,
28+
-0.2921
29+
],
30+
"std": [
31+
2.8184,
32+
1.4541,
33+
2.3275,
34+
2.6558,
35+
1.2196,
36+
1.7708,
37+
2.6052,
38+
2.0743,
39+
3.2687,
40+
2.1526,
41+
2.8652,
42+
1.5579,
43+
1.6382,
44+
1.1253,
45+
2.8251,
46+
1.9160
47+
]
48+
}

0 commit comments

Comments
 (0)