Git commit
A major performance regression occurs in the Vulkan GGML backend when moving from
commit 19bdfe2 → commit 9b0fceb.
On an AMD RX 580 running Windows 10 22H2 through sd.cpp‑webui, iteration time increases from ~9.5s/it to 22–23s/it, more than doubling generation time on identical workloads.
Operating System & Version
Windows 10 22H2
GGML backends
Vulkan
Command-line arguments used
sd-cli.exe -M img_gen -p "anime, depth of field, 1girl, closed eyes, white shirt, open shirt, cross necklace, nurse cap, pov, 1boy, holding another's arm, lora:anima-turbo-lora-v0.1:1" --sampling-method er_sde --steps 8 -W 1152 -H 896 -b 1 --cfg-scale 1 -s 20514 --clip-skip -1 --embd-dir D:\Programs\sd.cpp-webui\models/embeddings/ --lora-model-dir D:\Programs\sd.cpp-webui\models/loras/ -t 0 --rng std_default --sampler-rng std_default --lora-apply-mode auto -o D:\Programs\sd.cpp-webui\outputs\txt2img\158.png --diffusion-model D:\Programs\sd.cpp-webui\models\unet\anima_baseV10.safetensors --vae D:\Programs\sd.cpp-webui\models\vae\qwenimagevae_v7.safetensors --llm D:\Programs\sd.cpp-webui\models\text_encoders\qwen_3_06b_base.safetensors --type f16 --scheduler sgm_uniform --vae-tile-overlap 0.5 --vae-tile-size 64x64 --cache-mode spectrum --vae-tiling --diffusion-conv-direct --vae-conv-direct --mmap --color
Steps to reproduce
-
Launch sd.cpp‑webui using build 19bdfe2.
-
Run any image generation task using the Vulkan backend.
-
Update to build 9b0fceb.
-
Run the exact same command and settings.
-
Compare iteration times.
What you expected to happen
Performance should match or exceed previous Vulkan backend throughput.
What actually happened
BEFORE regression:
sd-master‑19bdfe2‑bin‑win‑vulkan‑x64
Performance: ~9.5 seconds per iteration
AFTER regression:
sd-master‑9b0fceb‑bin‑win‑vulkan‑x64
Performance: 22–23 seconds per iteration
This represents a 2.3× slowdown.
Logs / error messages / stack trace
No response
Additional context / environment details
While testing, it is possible to patch or swap the .exe and .dll files of sd.cpp‑webui during runtime.
As long as the change is made after a generation finishes or is cancelled, the process does not need to be restarted.
Because the models remain fully cached in VRAM and RAM, this allows:
Rapid A/B testing between builds
Switching between 19bdfe2 and 9b0fceb without reloading models
Eliminating model‑loading time as a variable
Ensuring the regression is isolated to runtime performance, not initialization
This confirms the slowdown is not caused by model reloads, cache state, or frontend overhead — it is directly tied to the Vulkan backend behavior between the two commits.
VRAM usage is stable and even slightly lower on the new commit; the regression is purely in iteration speed, not memory usage.
Git commit
A major performance regression occurs in the Vulkan GGML backend when moving from
commit 19bdfe2 → commit 9b0fceb.
On an AMD RX 580 running Windows 10 22H2 through sd.cpp‑webui, iteration time increases from ~9.5s/it to 22–23s/it, more than doubling generation time on identical workloads.
Operating System & Version
Windows 10 22H2
GGML backends
Vulkan
Command-line arguments used
sd-cli.exe -M img_gen -p "anime, depth of field, 1girl, closed eyes, white shirt, open shirt, cross necklace, nurse cap, pov, 1boy, holding another's arm, lora:anima-turbo-lora-v0.1:1" --sampling-method er_sde --steps 8 -W 1152 -H 896 -b 1 --cfg-scale 1 -s 20514 --clip-skip -1 --embd-dir D:\Programs\sd.cpp-webui\models/embeddings/ --lora-model-dir D:\Programs\sd.cpp-webui\models/loras/ -t 0 --rng std_default --sampler-rng std_default --lora-apply-mode auto -o D:\Programs\sd.cpp-webui\outputs\txt2img\158.png --diffusion-model D:\Programs\sd.cpp-webui\models\unet\anima_baseV10.safetensors --vae D:\Programs\sd.cpp-webui\models\vae\qwenimagevae_v7.safetensors --llm D:\Programs\sd.cpp-webui\models\text_encoders\qwen_3_06b_base.safetensors --type f16 --scheduler sgm_uniform --vae-tile-overlap 0.5 --vae-tile-size 64x64 --cache-mode spectrum --vae-tiling --diffusion-conv-direct --vae-conv-direct --mmap --color
Steps to reproduce
Launch sd.cpp‑webui using build 19bdfe2.
Run any image generation task using the Vulkan backend.
Update to build 9b0fceb.
Run the exact same command and settings.
Compare iteration times.
What you expected to happen
Performance should match or exceed previous Vulkan backend throughput.
What actually happened
BEFORE regression:
sd-master‑19bdfe2‑bin‑win‑vulkan‑x64
Performance: ~9.5 seconds per iteration
AFTER regression:
sd-master‑9b0fceb‑bin‑win‑vulkan‑x64
Performance: 22–23 seconds per iteration
This represents a 2.3× slowdown.
Logs / error messages / stack trace
No response
Additional context / environment details
While testing, it is possible to patch or swap the .exe and .dll files of sd.cpp‑webui during runtime.
As long as the change is made after a generation finishes or is cancelled, the process does not need to be restarted.
Because the models remain fully cached in VRAM and RAM, this allows:
Rapid A/B testing between builds
Switching between 19bdfe2 and 9b0fceb without reloading models
Eliminating model‑loading time as a variable
Ensuring the regression is isolated to runtime performance, not initialization
This confirms the slowdown is not caused by model reloads, cache state, or frontend overhead — it is directly tied to the Vulkan backend behavior between the two commits.
VRAM usage is stable and even slightly lower on the new commit; the regression is purely in iteration speed, not memory usage.