Skip to content

[Bug] QwenImagePipeline silently disables CFG when passing negative_prompt_embeds if mask is None (which encode_prompt returns by default) #13377

@Sunhill666

Description

@Sunhill666

Describe the bug

In QwenImagePipeline, when users manually pre-compute prompt embeddings to optimize memory usage (e.g., placing the encoder and transformer on different GPUs), Classifier-Free Guidance (CFG) is silently disabled if negative_prompt_embeds_mask is set to None.

However, encode_prompt explicitly converts an all-ones mask to None as an optimization. This creates a logical contradiction where the pipeline's own encoder returns a valid state (None) that the __call__ method subsequently rejects, causing CFG to fail with a warning.

Reproduction

If you manually extract embeddings and pass them to the pipeline:

# 1. Manually encode prompts
pos_embeds, pos_mask = pipeline.encode_prompt("A photo of a cat")
neg_embeds, neg_mask = pipeline.encode_prompt("bad quality") 
# Note: neg_mask is often `None` here because `encode_prompt` optimizes `prompt_embeds_mask.all() -> None`

# 2. Pass them to the pipeline
image = pipeline(
    prompt_embeds=pos_embeds,
    prompt_embeds_mask=pos_mask,
    negative_prompt_embeds=neg_embeds,
    negative_prompt_embeds_mask=neg_mask,  # This passes None
    true_cfg_scale=4.0
).images[0]

Output Warning:

true_cfg_scale is passed as 4.0, but classifier-free guidance is not enabled since no negative_prompt is provided.

Root Cause Analysis:
In pipeline_qwenimage.py:

1. The has_neg_prompt check in __call__ requires the mask to NOT be None:

has_neg_prompt = negative_prompt is not None or (
    negative_prompt_embeds is not None and negative_prompt_embeds_mask is not None
)

2. But encode_prompt intentionally sets the mask to None if it's full:

if prompt_embeds_mask is not None:
    # ... (reshape logic)
    if prompt_embeds_mask.all():
        prompt_embeds_mask = None  # <--- Here!

Because neg_mask becomes None, has_neg_prompt evaluates to False, and do_true_cfg is set to False.

Expected behavior:
The presence of negative_prompt_embeds alone should be sufficient to trigger has_neg_prompt = True. The negative_prompt_embeds_mask being None should simply mean "no masking is required" (all valid), which is consistent with the behavior of encode_prompt.

The check in __call__ should probably be relaxed to:

has_neg_prompt = negative_prompt is not None or negative_prompt_embeds is not None

Temporary Workaround:
Users currently have to manually fake an all-ones mask before passing it to the transformer:

if neg_mask is None:
    neg_mask = torch.ones(neg_embeds.shape[:2], dtype=torch.long, device=device)

Logs

System Info

  • 🤗 Diffusers version: 0.37.1
  • Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Running on Google Colab?: No
  • Python version: 3.10.20
  • PyTorch version (GPU?): 2.11.0+cu130 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 1.8.0
  • Transformers version: 5.4.0
  • Accelerate version: 1.13.0
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.7.0
  • xFormers version: not installed
  • Accelerator: NVIDIA GeForce RTX 3060, 12288 MiB
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Yes

Who can help?

@yiyixuxu @DN6

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions