Git commit
The commit I am currently using is 9b0fceb. This corresponds to the master-691-563137a-1-g9b0fceb version of the sd.cpp project.
Operating System & Version
| | | |---|---| | sd.cpp version | master-691-563137a-1-g9b0fceb (commit 9b0fceb) | | Backend | CUDA (-DSD_CUDA=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89) | | GPU | NVIDIA GeForce RTX 4080 Laptop (12 GB), compute capability 8.9 | | CUDA toolkit | 12.8 | | OS / compiler | Windows 11, MSVC 19.29 (VS 2019 Build Tools), Ninja |
GGML backends
CUDA
Command-line arguments used
### Steps to reproduce
```sh
./sd-cli \
--diffusion-model ideogram4-Q4_K.gguf \
--uncond-diffusion-model ideogram4_uncond-iQ4_NL.gguf \
--llm Qwen3VL-8B-Instruct-Q4_K_M.gguf \
--vae flux2_ae.safetensors \
-p '{"high_level_description":"A studio product photograph of a single fluffy orange cat next to a small wooden sign that reads ideogram4, soft daylight, clean light-gray background", "style_description":{"lighting":"soft diffused studio light"}}' \
-W 1024 -H 1024 --steps 20 --cfg-scale 7.0 --diffusion-fa -v -o out_1024.png
# Same command with -W 1536 -H 1536 -> correct image
# Same command with -W 2048 -H 2048 -> correct image
### What you expected to happen
a full-frame image at 1024×1024 like the one produced at 1536/2048.
### What actually happened
Actual at 1024×1024:** content only in a horizontal strip at the vertical center; the rest is a uniform fill.
### Logs / error messages / stack trace
Quantitative evidence — luminance standard deviation per cell on an 8×8 grid
(0 = perfectly flat). 1024² shows signal **only** in the two center rows:
1024×1024 (BROKEN) 1536×1536 (CORRECT)
1 1 1 1 1 1 1 1 5 5 23 48 47 24 5 5
0 0 0 0 0 0 0 0 6 6 4 4 4 4 5 5
0 0 0 0 0 0 0 0 5 5 4 69 74 4 4 5
33 50 49 51 45 53 54 34 4 4 4 72 73 4 4 3
28 54 46 48 46 47 46 30 3 2 26 55 54 34 3 4
0 0 0 0 0 0 0 0 3 3 22 40 39 30 6 4
0 0 0 0 0 0 0 0 4 12 39 61 58 49 10 4
1 0 1 1 1 0 1 0 4 4 34 63 46 53 9 5
overall stddev ≈ 23 (banded) overall stddev ≈ 43 (distributed)
Additional context / environment details
What I ruled out (the band is invariant to all of these)
The center-band collapse at 1024² is identical across every variable below; only
changing the resolution fixes it:
- Quantization:
Q6_K+Q2_K and Q4_K+iQ4_NL — identical.
- CFG scale:
1.0 (gray), 3.5, 7.0 — band present at all (1.0 is fully gray).
--guidance: present (3.5) vs. omitted — identical.
- Flash attention:
--diffusion-fa on vs. off — identical (off is lower contrast).
--flow-shift: auto vs. forced 3.0 vs. forced 6.0 — band persists.
- Prompt: sparse one-liner vs. rich structured JSON — band persists (rich prompt
changes the band's content but not the collapse).
- Offload:
--offload-to-cpu on vs. off — identical.
Resolution sweep (same command, varying -W/-H)
| Resolution |
Result |
overall stddev |
| 1024² |
broken (center band) |
~12–23 |
| 1280² |
partial (some spread, still weak) |
~33 |
| 1536² |
correct |
~44 |
| 2048² |
correct |
~43 |
Notes
Git commit
The commit I am currently using is 9b0fceb. This corresponds to the master-691-563137a-1-g9b0fceb version of the sd.cpp project.
Operating System & Version
| | | |---|---| | sd.cpp version |
master-691-563137a-1-g9b0fceb(commit9b0fceb) | | Backend | CUDA (-DSD_CUDA=ON -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89) | | GPU | NVIDIA GeForce RTX 4080 Laptop (12 GB), compute capability 8.9 | | CUDA toolkit | 12.8 | | OS / compiler | Windows 11, MSVC 19.29 (VS 2019 Build Tools), Ninja |GGML backends
CUDA
Command-line arguments used
1024×1024 (BROKEN) 1536×1536 (CORRECT)
1 1 1 1 1 1 1 1 5 5 23 48 47 24 5 5
0 0 0 0 0 0 0 0 6 6 4 4 4 4 5 5
0 0 0 0 0 0 0 0 5 5 4 69 74 4 4 5
33 50 49 51 45 53 54 34 4 4 4 72 73 4 4 3
28 54 46 48 46 47 46 30 3 2 26 55 54 34 3 4
0 0 0 0 0 0 0 0 3 3 22 40 39 30 6 4
0 0 0 0 0 0 0 0 4 12 39 61 58 49 10 4
1 0 1 1 1 0 1 0 4 4 34 63 46 53 9 5
overall stddev ≈ 23 (banded) overall stddev ≈ 43 (distributed)
Additional context / environment details
What I ruled out (the band is invariant to all of these)
The center-band collapse at 1024² is identical across every variable below; only
changing the resolution fixes it:
Q6_K+Q2_KandQ4_K+iQ4_NL— identical.1.0(gray),3.5,7.0— band present at all (1.0 is fully gray).--guidance: present (3.5) vs. omitted — identical.--diffusion-faon vs. off — identical (off is lower contrast).--flow-shift: auto vs. forced3.0vs. forced6.0— band persists.changes the band's content but not the collapse).
--offload-to-cpuon vs. off — identical.Resolution sweep (same command, varying
-W/-H)Notes
and the run exits 0 — the output is simply spatially collapsed on the height axis
at low resolution.
this build (
9b0fceb) post-dates those. Happy to test patches or provide the fullverbose
-vlogs and sample PNGs.