Skip to content

Add ernie image#13432

Open
HsiaWinter wants to merge 9 commits intohuggingface:mainfrom
HsiaWinter:add-ernie-image
Open

Add ernie image#13432
HsiaWinter wants to merge 9 commits intohuggingface:mainfrom
HsiaWinter:add-ernie-image

Conversation

@HsiaWinter
Copy link
Copy Markdown

What does this PR do?

We have introduced a new text-to-image model called ERNIE-Image, which will soon be open-sourced to the community. This PR includes the model architecture definition, the pipeline, as well as the related documentation and test files.

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@github-actions github-actions bot added documentation Improvements or additions to documentation models tests utils pipelines size/L PR with diff > 200 LOC labels Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!
i left some feedbacks

def __init__(self, hidden_size: int, num_heads: int, ffn_hidden_size: int, eps: float = 1e-6, qk_layernorm: bool = True):
super().__init__()
self.adaLN_sa_ln = RMSNorm(hidden_size, eps=eps)
self.self_attention = Attention(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return x.reshape(B, D, Hp * Wp).transpose(1, 2).contiguous()


class TimestepEmbedding(nn.Module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return ErnieImageTransformer2DModelOutput(sample=output) if return_dict else (output,)

def _pad_text(self, text_hiddens: List[torch.Tensor], device: torch.device, dtype: torch.dtype):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh, we are padding the text embeddings here, is it possible to move this outside of the model, in to the pipeline? e.g. you can pass image_ids, text_ids and text_seq_lens instead
i think it would affect torch.compile too if we pad text embeddings inside the transformer

return out.float()


class EmbedND3(nn.Module):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class EmbedND3(nn.Module):
class ErnieImageEmbedND3(nn.Module):

can we follow our naming conventions and add the ErnieImage prefix every where?

self.vae_scale_factor = 16 # VAE downsample factor

@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: str, **kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you write a custom from_pretrained method here? is there any reason, you could not use the from_pretrained in inhrited from DiffusionPipeline?

Comment on lines +186 to +189
if hasattr(self.pe, "_hf_hook") and hasattr(self.pe._hf_hook, "execution_device"):
pe_device = self.pe._hf_hook.execution_device
else:
pe_device = device
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if hasattr(self.pe, "_hf_hook") and hasattr(self.pe._hf_hook, "execution_device"):
pe_device = self.pe._hf_hook.execution_device
else:
pe_device = device
pe_device = device or self._execution_deivce

this is basically self._execution_device, no? https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pipeline_utils.py#L1136

text_hiddens = self.encode_prompt(prompt, device, num_images_per_prompt)

# CFG with negative prompt
do_cfg = guidance_scale > 1.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a do_classifier_free_guidance property instead?
https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux2/pipeline_flux2_klein.py#L590

Suggested change
do_cfg = guidance_scale > 1.0


# CFG with negative prompt
do_cfg = guidance_scale > 1.0
if do_cfg:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if do_cfg:
if self.do_classifier_free_guidance:

@yiyixuxu yiyixuxu requested a review from dg845 April 8, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation models pipelines size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants