Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/source/en/api/models/ernie_image_transformer2d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# ErnieImageTransformer2DModel

A Transformer model for image-like data from [ERNIE-Image](https://huggingface.co/baidu/ERNIE-Image).

A Transformer model for image-like data from [ERNIE-Image-Turbo](https://huggingface.co/baidu/ERNIE-Image-Turbo).

## ErnieImageTransformer2DModel

[[autodoc]] ErnieImageTransformer2DModel
86 changes: 86 additions & 0 deletions docs/source/en/api/pipelines/ernie_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Ernie-Image

<div class="flex flex-wrap space-x-1">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>

[ERNIE-Image] is a powerful and highly efficient image generation model with 8B parameters. Currently there's only two models to be released:

|Model|Hugging Face|
|---|---|
|ERNIE-Image|https://huggingface.co/baidu/ERNIE-Image|
|ERNIE-Image-Turbo|https://huggingface.co/baidu/ERNIE-Image-Turbo|

## ERNIE-Image

ERNIE-Image is designed with a relatively compact architecture and solid instruction-following capability, emphasizing parameter efficiency. Based on an 8B DiT backbone, it provides performance that is comparable in some scenarios to larger (20B+) models, while maintaining reasonable parameter efficiency. It offers a relatively stable level of performance in instruction understanding and execution, text generation (e.g., English / Chinese / Japanese), and overall stability.

## ERNIE-Image-Turbo

ERNIE-Image-Turbo is a distilled variant of ERNIE-Image, requiring only 8 NFEs (Number of Function Evaluations) and offering a more efficient alternative with relatively comparable performance to the full model in certain cases.

## ErnieImagePipeline

Use [ErnieImagePipeline] to generate images from text prompts. The pipeline supports Prompt Enhancer (PE) by default, which enhances the user’s raw prompt to improve output quality, though it may reduce instruction-following accuracy.

We provide a pretrained 3B-parameter PE model; however, using larger language models (e.g., Gemini or ChatGPT) for prompt enhancement may yield better results. The system prompt template is available at: https://huggingface.co/baidu/ERNIE-Image/blob/main/pe/chat_template.jinja.

If you prefer not to use PE, set use_pe=False.

```python
import torch
from diffusers import ErnieImagePipeline
from diffusers.utils import load_image

pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# 如果显存不足,可以开启offload
pipe.enable_model_cpu_offload()

prompt = "一只黑白相间的中华田园犬"
images = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=5.0,
generator=generator,
use_pe=True,
).images
images[0].save("ernie-image-output.png")
```

```python
import torch
from diffusers import ErnieImagePipeline
from diffusers.utils import load_image

pipe = ErnieImagePipeline.from_pretrained("baidu/ERNIE-Image-Turbo", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# 如果显存不足,可以开启offload
pipe.enable_model_cpu_offload()

prompt = "一只黑白相间的中华田园犬"
images = pipe(
prompt=prompt,
height=1024,
width=1024,
num_inference_steps=8,
guidance_scale=5.0,
generator=generator,
use_pe=True,
).images
images[0].save("ernie-image-turbo-output.png")
```
4 changes: 4 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@
"ZImageControlNetModel",
"ZImageTransformer2DModel",
"attention_backend",
"ErnieImageTransformer2DModel"
]
)
_import_structure["modular_pipelines"].extend(
Expand Down Expand Up @@ -744,6 +745,7 @@
"ZImageInpaintPipeline",
"ZImageOmniPipeline",
"ZImagePipeline",
"ErnieImagePipeline",
]
)

Expand Down Expand Up @@ -1101,6 +1103,7 @@
ZImageControlNetModel,
ZImageTransformer2DModel,
attention_backend,
ErnieImageTransformer2DModel,
)
from .modular_pipelines import (
AutoPipelineBlocks,
Expand Down Expand Up @@ -1517,6 +1520,7 @@
ZImageInpaintPipeline,
ZImageOmniPipeline,
ZImagePipeline,
ErnieImagePipeline,
)

try:
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@
_import_structure["transformers.transformer_cogview4"] = ["CogView4Transformer2DModel"]
_import_structure["transformers.transformer_cosmos"] = ["CosmosTransformer3DModel"]
_import_structure["transformers.transformer_easyanimate"] = ["EasyAnimateTransformer3DModel"]
_import_structure["transformers.transformer_ernie_image"] = ["ErnieImageTransformer2DModel"]
_import_structure["transformers.transformer_flux"] = ["FluxTransformer2DModel"]
_import_structure["transformers.transformer_flux2"] = ["Flux2Transformer2DModel"]
_import_structure["transformers.transformer_glm_image"] = ["GlmImageTransformer2DModel"]
Expand Down Expand Up @@ -219,6 +220,7 @@
DiTTransformer2DModel,
DualTransformer2DModel,
EasyAnimateTransformer3DModel,
ErnieImageTransformer2DModel,
Flux2Transformer2DModel,
FluxTransformer2DModel,
GlmImageTransformer2DModel,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/models/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,4 @@
from .transformer_wan_animate import WanAnimateTransformer3DModel
from .transformer_wan_vace import WanVACETransformer3DModel
from .transformer_z_image import ZImageTransformer2DModel
from .transformer_ernie_image import ErnieImageTransformer2DModel
Loading
Loading