continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init (card 346b356f) by joelteply · Pull Request #1 · CambrianTech/llama.cpp

joelteply · 2026-06-07T00:41:08Z

continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init

ggml_metal_device_init calls MTLCreateSystemDefaultDevice() which on
Intel Macs returns the discrete GPU. On MacBookPro15,1 (Radeon Pro 560X,
Polaris, macOS 15.7.7), the first newCommandQueue call enters
amdMtlBronzeLazyInit → amdMtlAllocateBuffer → IOAccelResourceCreate
→ IOConnectCallMethod → mach_msg2_trap and never returns. Because
GGML's Metal backend registers via a global static initializer
(ggml_backend_registry::ggml_backend_registry), this hangs at process
startup — every llama.cpp consumer on this hardware deadlocks before
arg parsing.

Captured via sample(1) of the live hung process:

main → common_params_parser_init
→ ggml_backend_registry::ggml_backend_registry()
→ ggml_backend_metal_reg (ggml-metal.cpp:924)
→ ggml_metal_device_init (ggml-metal-device.m:641)
→ -[BronzeMtlDevice newCommandQueue]
→ -[BronzeMtlDevice amdMtlBronzeLazyInit]
→ IOAccelResourceCreate
→ mach_msg2_trap ← HANG

Fix: enumerate via MTLCopyAllDevices() and select with priority:

Apple Silicon (always best)
Integrated low-power (Intel UHD/Iris on Intel Mac — substrate LCD floor;
hundreds of millions of Intel Macs ship with one)
External / eGPU
Discrete (last; may hang on Polaris-era AMD)

Operator can override via GGML_METAL_DEVICE_NAME=<substring> env var.
Logs all enumerated devices with lowPower/removable/location
properties to give operators visibility into what was selected and why.

Tested on MacBookPro15,1 / macOS 15.7.7:

Before: --help hung forever (sample captured the AMD mach_msg2_trap)
After: --help returns immediately; llama-cli loads Qwen 2.5 0.5B
on Intel UHD 630, runs inference at 22.5 tok/s

Known follow-up: Metal q4_k matmul kernel produces incorrect output on
Intel UHD 630 (output: "antity@@@@@@@" instead of "Hello! How can..."),
likely the matrix-vector fallback that runs when has_simdgroup_mm=false.
Carded separately; not blocking the hang fix.

Tested: macOS 15.7.7, Xcode 26.3 (macOS 26.2 SDK), MacBookPro15,1
(Coffee Lake + Intel UHD 630 + AMD Radeon Pro 560X).

Card: 346b356f.

…s lazy init `ggml_metal_device_init` calls `MTLCreateSystemDefaultDevice()` which on Intel Macs returns the discrete GPU. On MacBookPro15,1 (Radeon Pro 560X, Polaris, macOS 15.7.7), the first `newCommandQueue` call enters `amdMtlBronzeLazyInit` → `amdMtlAllocateBuffer` → `IOAccelResourceCreate` → `IOConnectCallMethod` → `mach_msg2_trap` and never returns. Because GGML's Metal backend registers via a global static initializer (`ggml_backend_registry::ggml_backend_registry`), this hangs at process startup — every llama.cpp consumer on this hardware deadlocks before arg parsing. Captured via `sample(1)` of the live hung process: main → common_params_parser_init → ggml_backend_registry::ggml_backend_registry() → ggml_backend_metal_reg (ggml-metal.cpp:924) → ggml_metal_device_init (ggml-metal-device.m:641) → -[BronzeMtlDevice newCommandQueue] → -[BronzeMtlDevice amdMtlBronzeLazyInit] → IOAccelResourceCreate → mach_msg2_trap ← HANG Fix: enumerate via `MTLCopyAllDevices()` and select with priority: 1. Apple Silicon (always best) 2. Integrated low-power (Intel UHD/Iris on Intel Mac — substrate LCD floor; hundreds of millions of Intel Macs ship with one) 3. External / eGPU 4. Discrete (last; may hang on Polaris-era AMD) Operator can override via `GGML_METAL_DEVICE_NAME=<substring>` env var. Logs all enumerated devices with `lowPower`/`removable`/`location` properties to give operators visibility into what was selected and why. Tested on MacBookPro15,1 / macOS 15.7.7: - Before: `--help` hung forever (sample captured the AMD mach_msg2_trap) - After: `--help` returns immediately; `llama-cli` loads Qwen 2.5 0.5B on Intel UHD 630, runs inference at 22.5 tok/s Known follow-up: Metal q4_k matmul kernel produces incorrect output on Intel UHD 630 (output: "antity@@@@@@@" instead of "Hello! How can..."), likely the matrix-vector fallback that runs when `has_simdgroup_mm=false`. Carded separately; not blocking the hang fix. Tested: macOS 15.7.7, Xcode 26.3 (macOS 26.2 SDK), MacBookPro15,1 (Coffee Lake + Intel UHD 630 + AMD Radeon Pro 560X). Card: 346b356f.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init (card 346b356f)#1

continuum: explicit Metal device selection — avoid hang in AMD Polaris lazy init (card 346b356f)#1
joelteply wants to merge 1 commit into
continuum/qwen35-metal-controlsfrom
346b356f/metal-amd-intel-mac

joelteply commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant