Skip to content

fix(cuda): sanitize invalid Blackwell smpbo values#23766

Draft
peter941221 wants to merge 1 commit into
ggml-org:masterfrom
peter941221:fix/blackwell-smpbo-sanity-pr
Draft

fix(cuda): sanitize invalid Blackwell smpbo values#23766
peter941221 wants to merge 1 commit into
ggml-org:masterfrom
peter941221:fix/blackwell-smpbo-sanity-pr

Conversation

@peter941221
Copy link
Copy Markdown

Fixes #23385.

Failure mechanism:

  • early Blackwell launch drivers can report broken cudaDeviceProp.sharedMemPerBlockOptin values such as 0 or 0x100000001
  • ggml_cuda_init() stored that value directly in info.devices[id].smpbo
  • multiple CUDA paths consume smpbo as a hard shared-memory ceiling, not just softmax
  • mmq.cuh uses it to prune valid MMQ tile shapes, which can leave mmq_x_best = 0 and abort on Blackwell
  • mmid.cu and softmax.cu also use the same cached limit when setting dynamic shared memory attributes

Semantic change:

  • sanitize sharedMemPerBlockOptin once during CUDA device-info initialization
  • if the reported value is 0 or implausibly large (> 1 MiB), log a warning and fall back to sharedMemPerBlock
  • keep the workaround centralized so every smpbo consumer gets the same sane value

Why here instead of another per-kernel workaround:

Validation:

  • source-path verification:
    • ggml/src/ggml-cuda/ggml-cuda.cu initializes info.devices[id].smpbo
    • ggml/src/ggml-cuda/mmq.cuh gates MMQ tile selection on smpbo
    • ggml/src/ggml-cuda/mmid.cu and ggml/src/ggml-cuda/softmax.cu also consume smpbo
  • local Windows configure passed with -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=120
  • targeted build did not finish within the local command timeout window, so runtime reproduction is still pending on the patched tree

@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 27, 2026
@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented May 27, 2026

Hi @peter941221, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 3 open PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Fatal MMQ crashes on Blackwell (RTX 5090/5080) due to unhandled sharedMemPerBlockOptin driver bug

1 participant