Skip to content

[Bug] Ideogram 4: output collapses to a horizontal center band at 1024×1024 (correct at 1536²/2048²) #1648

@Meeseeks-Mr

Description

@Meeseeks-Mr

Git commit

The commit I am currently using is 9b0fceb. This corresponds to the master-691-563137a-1-g9b0fceb version of the sd.cpp project.

Operating System & Version

| | | |---|---| | sd.cpp version | master-691-563137a-1-g9b0fceb (commit 9b0fceb) | | Backend | CUDA (-DSD_CUDA=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89) | | GPU | NVIDIA GeForce RTX 4080 Laptop (12 GB), compute capability 8.9 | | CUDA toolkit | 12.8 | | OS / compiler | Windows 11, MSVC 19.29 (VS 2019 Build Tools), Ninja |

GGML backends

CUDA

Command-line arguments used

### Steps to reproduce


```sh
./sd-cli \
  --diffusion-model ideogram4-Q4_K.gguf \
  --uncond-diffusion-model ideogram4_uncond-iQ4_NL.gguf \
  --llm Qwen3VL-8B-Instruct-Q4_K_M.gguf \
  --vae flux2_ae.safetensors \
  -p '{"high_level_description":"A studio product photograph of a single fluffy orange cat next to a small wooden sign that reads ideogram4, soft daylight, clean light-gray background", "style_description":{"lighting":"soft diffused studio light"}}' \
  -W 1024 -H 1024 --steps 20 --cfg-scale 7.0 --diffusion-fa -v -o out_1024.png

# Same command with -W 1536 -H 1536  -> correct image
# Same command with -W 2048 -H 2048  -> correct image


### What you expected to happen

a full-frame image at 1024×1024 like the one produced at 1536/2048.

### What actually happened

Actual at 1024×1024:** content only in a horizontal strip at the vertical center; the rest is a uniform fill.


### Logs / error messages / stack trace

Quantitative evidence — luminance standard deviation per cell on an 8×8 grid
(0 = perfectly flat). 1024² shows signal **only** in the two center rows:

1024×1024 (BROKEN) 1536×1536 (CORRECT)
1 1 1 1 1 1 1 1 5 5 23 48 47 24 5 5
0 0 0 0 0 0 0 0 6 6 4 4 4 4 5 5
0 0 0 0 0 0 0 0 5 5 4 69 74 4 4 5
33 50 49 51 45 53 54 34 4 4 4 72 73 4 4 3
28 54 46 48 46 47 46 30 3 2 26 55 54 34 3 4
0 0 0 0 0 0 0 0 3 3 22 40 39 30 6 4
0 0 0 0 0 0 0 0 4 12 39 61 58 49 10 4
1 0 1 1 1 0 1 0 4 4 34 63 46 53 9 5
overall stddev ≈ 23 (banded) overall stddev ≈ 43 (distributed)

Additional context / environment details

What I ruled out (the band is invariant to all of these)

The center-band collapse at 1024² is identical across every variable below; only
changing the resolution fixes it:

  • Quantization: Q6_K+Q2_K and Q4_K+iQ4_NL — identical.
  • CFG scale: 1.0 (gray), 3.5, 7.0 — band present at all (1.0 is fully gray).
  • --guidance: present (3.5) vs. omitted — identical.
  • Flash attention: --diffusion-fa on vs. off — identical (off is lower contrast).
  • --flow-shift: auto vs. forced 3.0 vs. forced 6.0 — band persists.
  • Prompt: sparse one-liner vs. rich structured JSON — band persists (rich prompt
    changes the band's content but not the collapse).
  • Offload: --offload-to-cpu on vs. off — identical.

Resolution sweep (same command, varying -W/-H)

Resolution Result overall stddev
1024² broken (center band) ~12–23
1280² partial (some spread, still weak) ~33
1536² correct ~44
2048² correct ~43

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions