Skip to content

[Bug] Vulkan GGML Backend Performance Regression on commit 9b0fceb #1647

@sz1kormar

Description

@sz1kormar

Git commit

A major performance regression occurs in the Vulkan GGML backend when moving from
commit 19bdfe2 → commit 9b0fceb.
On an AMD RX 580 running Windows 10 22H2 through sd.cpp‑webui, iteration time increases from ~9.5s/it to 22–23s/it, more than doubling generation time on identical workloads.

Operating System & Version

Windows 10 22H2

GGML backends

Vulkan

Command-line arguments used

sd-cli.exe -M img_gen -p "anime, depth of field, 1girl, closed eyes, white shirt, open shirt, cross necklace, nurse cap, pov, 1boy, holding another's arm, lora:anima-turbo-lora-v0.1:1" --sampling-method er_sde --steps 8 -W 1152 -H 896 -b 1 --cfg-scale 1 -s 20514 --clip-skip -1 --embd-dir D:\Programs\sd.cpp-webui\models/embeddings/ --lora-model-dir D:\Programs\sd.cpp-webui\models/loras/ -t 0 --rng std_default --sampler-rng std_default --lora-apply-mode auto -o D:\Programs\sd.cpp-webui\outputs\txt2img\158.png --diffusion-model D:\Programs\sd.cpp-webui\models\unet\anima_baseV10.safetensors --vae D:\Programs\sd.cpp-webui\models\vae\qwenimagevae_v7.safetensors --llm D:\Programs\sd.cpp-webui\models\text_encoders\qwen_3_06b_base.safetensors --type f16 --scheduler sgm_uniform --vae-tile-overlap 0.5 --vae-tile-size 64x64 --cache-mode spectrum --vae-tiling --diffusion-conv-direct --vae-conv-direct --mmap --color

Steps to reproduce

  1. Launch sd.cpp‑webui using build 19bdfe2.

  2. Run any image generation task using the Vulkan backend.

  3. Update to build 9b0fceb.

  4. Run the exact same command and settings.

  5. Compare iteration times.

What you expected to happen

Performance should match or exceed previous Vulkan backend throughput.

What actually happened

BEFORE regression:
sd-master‑19bdfe2‑bin‑win‑vulkan‑x64

Performance: ~9.5 seconds per iteration

AFTER regression:
sd-master‑9b0fceb‑bin‑win‑vulkan‑x64

Performance: 22–23 seconds per iteration

This represents a 2.3× slowdown.

Logs / error messages / stack trace

No response

Additional context / environment details

While testing, it is possible to patch or swap the .exe and .dll files of sd.cpp‑webui during runtime.
As long as the change is made after a generation finishes or is cancelled, the process does not need to be restarted.

Because the models remain fully cached in VRAM and RAM, this allows:

Rapid A/B testing between builds

Switching between 19bdfe2 and 9b0fceb without reloading models

Eliminating model‑loading time as a variable

Ensuring the regression is isolated to runtime performance, not initialization

This confirms the slowdown is not caused by model reloads, cache state, or frontend overhead — it is directly tied to the Vulkan backend behavior between the two commits.

VRAM usage is stable and even slightly lower on the new commit; the regression is purely in iteration speed, not memory usage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions