[Bug] Vulkan GGML Backend Performance Regression on commit 9b0fceb

### Git commit

A major performance regression occurs in the Vulkan GGML backend when moving from
commit 19bdfe2 → commit 9b0fceb.
On an AMD RX 580 running Windows 10 22H2 through sd.cpp‑webui, iteration time increases from ~9.5s/it to 22–23s/it, more than doubling generation time on identical workloads.

### Operating System & Version

Windows 10 22H2

### GGML backends

Vulkan

### Command-line arguments used

sd-cli.exe -M img_gen -p "anime, depth of field, 1girl, closed eyes, white shirt, open shirt, cross necklace, nurse cap, pov, 1boy, holding another's arm, <lora:anima-turbo-lora-v0.1:1>" --sampling-method er_sde --steps 8 -W 1152 -H 896 -b 1 --cfg-scale 1 -s 20514 --clip-skip -1 --embd-dir D:\Programs\sd.cpp-webui\models/embeddings/ --lora-model-dir D:\Programs\sd.cpp-webui\models/loras/ -t 0 --rng std_default --sampler-rng std_default --lora-apply-mode auto -o D:\Programs\sd.cpp-webui\outputs\txt2img\158.png --diffusion-model D:\Programs\sd.cpp-webui\models\unet\anima_baseV10.safetensors --vae D:\Programs\sd.cpp-webui\models\vae\qwenimagevae_v7.safetensors --llm D:\Programs\sd.cpp-webui\models\text_encoders\qwen_3_06b_base.safetensors --type f16 --scheduler sgm_uniform --vae-tile-overlap 0.5 --vae-tile-size 64x64 --cache-mode spectrum --vae-tiling --diffusion-conv-direct --vae-conv-direct --mmap --color

### Steps to reproduce

1. Launch sd.cpp‑webui using build 19bdfe2.

2. Run any image generation task using the Vulkan backend.

3. Update to build 9b0fceb.

4. Run the exact same command and settings.

5. Compare iteration times.

### What you expected to happen

Performance should match or exceed previous Vulkan backend throughput.

### What actually happened

BEFORE regression:  
sd-master‑19bdfe2‑bin‑win‑vulkan‑x64

Performance: ~9.5 seconds per iteration

AFTER regression:  
sd-master‑9b0fceb‑bin‑win‑vulkan‑x64

Performance: 22–23 seconds per iteration

This represents a 2.3× slowdown.

### Logs / error messages / stack trace

_No response_

### Additional context / environment details

While testing, it is possible to patch or swap the .exe and .dll files of sd.cpp‑webui during runtime.
As long as the change is made after a generation finishes or is cancelled, the process does not need to be restarted.

Because the models remain fully cached in VRAM and RAM, this allows:

Rapid A/B testing between builds

Switching between 19bdfe2 and 9b0fceb without reloading models

Eliminating model‑loading time as a variable

Ensuring the regression is isolated to runtime performance, not initialization

This confirms the slowdown is not caused by model reloads, cache state, or frontend overhead — it is directly tied to the Vulkan backend behavior between the two commits.

VRAM usage is stable and even slightly lower on the new commit; the regression is purely in iteration speed, not memory usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Vulkan GGML Backend Performance Regression on commit 9b0fceb #1647

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Vulkan GGML Backend Performance Regression on commit 9b0fceb #1647

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions