continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init (card 346b356f)#1
Open
joelteply wants to merge 1 commit into
Conversation
…s lazy init
`ggml_metal_device_init` calls `MTLCreateSystemDefaultDevice()` which on
Intel Macs returns the discrete GPU. On MacBookPro15,1 (Radeon Pro 560X,
Polaris, macOS 15.7.7), the first `newCommandQueue` call enters
`amdMtlBronzeLazyInit` → `amdMtlAllocateBuffer` → `IOAccelResourceCreate`
→ `IOConnectCallMethod` → `mach_msg2_trap` and never returns. Because
GGML's Metal backend registers via a global static initializer
(`ggml_backend_registry::ggml_backend_registry`), this hangs at process
startup — every llama.cpp consumer on this hardware deadlocks before
arg parsing.
Captured via `sample(1)` of the live hung process:
main → common_params_parser_init
→ ggml_backend_registry::ggml_backend_registry()
→ ggml_backend_metal_reg (ggml-metal.cpp:924)
→ ggml_metal_device_init (ggml-metal-device.m:641)
→ -[BronzeMtlDevice newCommandQueue]
→ -[BronzeMtlDevice amdMtlBronzeLazyInit]
→ IOAccelResourceCreate
→ mach_msg2_trap ← HANG
Fix: enumerate via `MTLCopyAllDevices()` and select with priority:
1. Apple Silicon (always best)
2. Integrated low-power (Intel UHD/Iris on Intel Mac — substrate LCD floor;
hundreds of millions of Intel Macs ship with one)
3. External / eGPU
4. Discrete (last; may hang on Polaris-era AMD)
Operator can override via `GGML_METAL_DEVICE_NAME=<substring>` env var.
Logs all enumerated devices with `lowPower`/`removable`/`location`
properties to give operators visibility into what was selected and why.
Tested on MacBookPro15,1 / macOS 15.7.7:
- Before: `--help` hung forever (sample captured the AMD mach_msg2_trap)
- After: `--help` returns immediately; `llama-cli` loads Qwen 2.5 0.5B
on Intel UHD 630, runs inference at 22.5 tok/s
Known follow-up: Metal q4_k matmul kernel produces incorrect output on
Intel UHD 630 (output: "antity@@@@@@@" instead of "Hello! How can..."),
likely the matrix-vector fallback that runs when `has_simdgroup_mm=false`.
Carded separately; not blocking the hang fix.
Tested: macOS 15.7.7, Xcode 26.3 (macOS 26.2 SDK), MacBookPro15,1
(Coffee Lake + Intel UHD 630 + AMD Radeon Pro 560X).
Card: 346b356f.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init
ggml_metal_device_initcallsMTLCreateSystemDefaultDevice()which onIntel Macs returns the discrete GPU. On MacBookPro15,1 (Radeon Pro 560X,
Polaris, macOS 15.7.7), the first
newCommandQueuecall entersamdMtlBronzeLazyInit→amdMtlAllocateBuffer→IOAccelResourceCreate→
IOConnectCallMethod→mach_msg2_trapand never returns. BecauseGGML's Metal backend registers via a global static initializer
(
ggml_backend_registry::ggml_backend_registry), this hangs at processstartup — every llama.cpp consumer on this hardware deadlocks before
arg parsing.
Captured via
sample(1)of the live hung process:main → common_params_parser_init
→ ggml_backend_registry::ggml_backend_registry()
→ ggml_backend_metal_reg (ggml-metal.cpp:924)
→ ggml_metal_device_init (ggml-metal-device.m:641)
→ -[BronzeMtlDevice newCommandQueue]
→ -[BronzeMtlDevice amdMtlBronzeLazyInit]
→ IOAccelResourceCreate
→ mach_msg2_trap ← HANG
Fix: enumerate via
MTLCopyAllDevices()and select with priority:hundreds of millions of Intel Macs ship with one)
Operator can override via
GGML_METAL_DEVICE_NAME=<substring>env var.Logs all enumerated devices with
lowPower/removable/locationproperties to give operators visibility into what was selected and why.
Tested on MacBookPro15,1 / macOS 15.7.7:
--helphung forever (sample captured the AMD mach_msg2_trap)--helpreturns immediately;llama-cliloads Qwen 2.5 0.5Bon Intel UHD 630, runs inference at 22.5 tok/s
Known follow-up: Metal q4_k matmul kernel produces incorrect output on
Intel UHD 630 (output: "antity@@@@@@@" instead of "Hello! How can..."),
likely the matrix-vector fallback that runs when
has_simdgroup_mm=false.Carded separately; not blocking the hang fix.
Tested: macOS 15.7.7, Xcode 26.3 (macOS 26.2 SDK), MacBookPro15,1
(Coffee Lake + Intel UHD 630 + AMD Radeon Pro 560X).
Card: 346b356f.