fix(cuda): sanitize invalid Blackwell smpbo values by peter941221 · Pull Request #23766 · ggml-org/llama.cpp

peter941221 · 2026-05-27T07:27:04Z

Fixes #23385.

Failure mechanism:

early Blackwell launch drivers can report broken cudaDeviceProp.sharedMemPerBlockOptin values such as 0 or 0x100000001
ggml_cuda_init() stored that value directly in info.devices[id].smpbo
multiple CUDA paths consume smpbo as a hard shared-memory ceiling, not just softmax
mmq.cuh uses it to prune valid MMQ tile shapes, which can leave mmq_x_best = 0 and abort on Blackwell
mmid.cu and softmax.cu also use the same cached limit when setting dynamic shared memory attributes

Semantic change:

sanitize sharedMemPerBlockOptin once during CUDA device-info initialization
if the reported value is 0 or implausibly large (> 1 MiB), log a warning and fall back to sharedMemPerBlock
keep the workaround centralized so every smpbo consumer gets the same sane value

Why here instead of another per-kernel workaround:

PR fix Blackwell sharedMemPerBlockOptin overflow causing SOFT_MAX error #22338 already demonstrated the same Blackwell driver bug on a softmax path
issue Eval bug: Fatal MMQ crashes on Blackwell (RTX 5090/5080) due to unhandled sharedMemPerBlockOptin driver bug #23385 shows the remaining MMQ failure comes from the same cached device property
fixing the cached property is the narrowest place that covers MMQ, MMID, and SOFT_MAX together

Validation:

source-path verification:
- ggml/src/ggml-cuda/ggml-cuda.cu initializes info.devices[id].smpbo
- ggml/src/ggml-cuda/mmq.cuh gates MMQ tile selection on smpbo
- ggml/src/ggml-cuda/mmid.cu and ggml/src/ggml-cuda/softmax.cu also consume smpbo
local Windows configure passed with -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=120
targeted build did not finish within the local command timeout window, so runtime reproduction is still pending on the patched tree

ggml-gh-bot · 2026-05-27T07:32:33Z

Hi @peter941221, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 3 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

fix(cuda): sanitize smpbo on Blackwell

c7982b6

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cuda): sanitize invalid Blackwell smpbo values#23766

fix(cuda): sanitize invalid Blackwell smpbo values#23766
peter941221 wants to merge 1 commit into
ggml-org:masterfrom
peter941221:fix/blackwell-smpbo-sanity-pr

peter941221 commented May 27, 2026

Uh oh!

ggml-gh-bot Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peter941221 commented May 27, 2026

Uh oh!

ggml-gh-bot Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant