kleidiai : dynamic chunck-based scheduling for hybrid execution by chaxu01 · Pull Request #23819 · ggml-org/llama.cpp

chaxu01 · 2026-05-28T13:30:14Z

Overview

This investigation aims to explore replacing the static weighting model with a dynamic chunk-based scheduling approach, leveraging the recently introduced repack matmul chunking mechanism (PR #16833). The goal is to enable adaptive, runtime-driven work distribution between SME and NEON kernels without relying on hardcoded ratios.

Additional information

Benchmarks from Samsung S26 Exynos — Llama-3.2-1B-Instruct-Q4_0 (pp512)

Threads	Global Queue (t/s)	Static Quadratic (t/s)	Δ (%)
1	292.08 ± 2.02	288.02 ± 3.31	+1.4%
2	303.90 ± 0.63	163.87 ± 2.68	+85.4%
4	430.69 ± 11.35	255.75 ± 16.43	+68.4%
6	450.48 ± 23.96	297.61 ± 9.80	+51.4%
8	499.28 ± 22.37	367.45 ± 9.09	+35.9%
10	489.62 ± 25.13	389.00 ± 8.12	+25.9%

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES, Codex was used as code review assistance

kleidiai : dynamic chunck-based scheduling for hybrid execution

888cab2

chaxu01 requested a review from ggerganov as a code owner May 28, 2026 13:30

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label May 28, 2026

chaxu01 mentioned this pull request May 28, 2026

kleidiai : update to v1.24.0 and use release archive #22549

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kleidiai : dynamic chunck-based scheduling for hybrid execution#23819

kleidiai : dynamic chunck-based scheduling for hybrid execution#23819
chaxu01 wants to merge 1 commit into
ggml-org:masterfrom
chaxu01:feature/global-queue

chaxu01 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chaxu01 commented May 28, 2026

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant