Windows ARM64 (Snapdragon X Elite) Build: Three Blockers and Fixes

## System

| | |
|--------|--------------------------------------|
| OS     | Windows 11 ARM64 Build 26200         |
| CPU    | Qualcomm Snapdragon X Elite          |
| VS     | 2022 Community 17.14, ClangCL 19.1.5 |
| CMake  | 4.2.3                                |
| Python | 3.13 ARM64                           |

## Summary

Native Windows ARM64 build is possible on Snapdragon X Elite with three fixes. After applying them, `llama-cli.exe` builds and runs at **28.59 tok/s** (i2_s kernel, 8 threads, 2.41B model). `MATMUL_INT8 = 1` confirms i8mm is active.

---

## Blocker 1: Wrong `-march` flags (i8mm falsely believed unsupported)

The existing documentation and community issues mark i8mm support on Snapdragon X Elite as "unknown." It is not unknown — the CPU fully supports it. WSL2 `/proc/cpuinfo` confirms:

```
Features: ... i8mm bf16 dotprod asimddp ...
```

The error when building with `-march=armv8.2-a+fp16`:

```
error: always_inline function 'vmmlaq_s32' requires target feature 'i8mm',
but would be inlined into function 'ggml_vec_dot_q4_0_q8_0' that is compiled
without support for 'i8mm'
```

ClangCL detects the ARM64 target and conditionally enables the i8mm code path, but the `-march` flag doesn't authorise the instructions. **Fix:** use `-march=armv8.6-a+fp16` — the Snapdragon X Elite is ARMv8.6-A capable.

---

## Blocker 2: C++ exceptions disabled by default in ClangCL

```
error: cannot use 'throw' with exceptions disabled
```

ClangCL defaults to `/EHs-`. llama.cpp uses `throw` throughout. **Fix:** add `/EHsc` to `CMAKE_CXX_FLAGS`.

---

## Blocker 3: Missing `<chrono>` include in `common/common.cpp` and `common/log.cpp`

```
error: no type named 'system_clock' in namespace 'std::chrono'
error: 'clock' is not a class, namespace, or enumeration
```

On Linux/Mac, `<chrono>` is pulled in transitively through `<thread>`. On Windows with ClangCL it is not. Both `common.cpp` and `log.cpp` use `std::chrono` without explicitly including `<chrono>`. **Fix:** add `#include <chrono>` to both files.

---

## Working Build Command

From a VS 2022 Developer PowerShell with ClangCL on PATH:

```powershell
# Kernel generation (required first)
python utils/codegen_tl1.py --model bitnet_b1_58-3B --BM 160,320,320 --BK 64,128,64 --bm 32,64,32

# Configure
cmake -B build `
  -T ClangCL `
  -DBITNET_ARM_TL1=OFF `
  -DCMAKE_BUILD_TYPE=Release `
  -DCMAKE_C_FLAGS="-march=armv8.6-a+fp16" `
  -DCMAKE_CXX_FLAGS="-march=armv8.6-a+fp16 /EHsc"

# Build
cmake --build build --config Release --target llama-cli
```

---

## Confirmed Performance — Snapdragon X Elite, i2_s kernel

| Metric       | Value                      |
|-------------|----------------------------|
| Model       | BitNet-b1.58-2B-4T (i2_s)  |
| Load time   | 844 ms                     |
| Prompt eval | 210.27 tok/s               |
| Generation  | 28.59 tok/s                |
| Threads     | 8                          |
| MATMUL_INT8 | active                     |

---

## Suggested Repo Changes

1. **CMakeLists.txt** — detect ClangCL on ARM64 Windows and add `/EHsc` and `-march=armv8.6-a+fp16` automatically
2. **common/common.cpp** and **common/log.cpp** — add `#include <chrono>` explicitly (transitive include is not portable)
3. **Documentation** — note that Snapdragon X Elite supports i8mm, bf16, dotprod; `-march=armv8.6-a` is the correct target
4. **CI** — consider adding a Windows ARM64 build check (GitHub Actions now supports ARM64 runners)

---

## Additional Note

`gguf-py` install fails on Windows ARM64 due to a CMake 4.x incompatibility in the sentencepiece submodule and a hardcoded `-A x64` arch flag. This blocks `setup_env.py` but is not needed for inference if GGUF models are downloaded directly from HuggingFace.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows ARM64 (Snapdragon X Elite) Build: Three Blockers and Fixes #440

System

Summary

Blocker 1: Wrong `-march` flags (i8mm falsely believed unsupported)

Blocker 2: C++ exceptions disabled by default in ClangCL

Blocker 3: Missing `<chrono>` include in `common/common.cpp` and `common/log.cpp`

Working Build Command

Confirmed Performance — Snapdragon X Elite, i2_s kernel

Suggested Repo Changes

Additional Note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development


OS	Windows 11 ARM64 Build 26200
CPU	Qualcomm Snapdragon X Elite
VS	2022 Community 17.14, ClangCL 19.1.5
CMake	4.2.3
Python	3.13 ARM64

Metric	Value
Model	BitNet-b1.58-2B-4T (i2_s)
Load time	844 ms
Prompt eval	210.27 tok/s
Generation	28.59 tok/s
Threads	8
MATMUL_INT8	active

Windows ARM64 (Snapdragon X Elite) Build: Three Blockers and Fixes #440

Description

System

Summary

Blocker 1: Wrong -march flags (i8mm falsely believed unsupported)

Blocker 2: C++ exceptions disabled by default in ClangCL

Blocker 3: Missing <chrono> include in common/common.cpp and common/log.cpp

Working Build Command

Confirmed Performance — Snapdragon X Elite, i2_s kernel

Suggested Repo Changes

Additional Note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Blocker 1: Wrong `-march` flags (i8mm falsely believed unsupported)

Blocker 3: Missing `<chrono>` include in `common/common.cpp` and `common/log.cpp`