Skip to content

kleidiai : dynamic chunck-based scheduling for hybrid execution#23819

Open
chaxu01 wants to merge 1 commit into
ggml-org:masterfrom
chaxu01:feature/global-queue
Open

kleidiai : dynamic chunck-based scheduling for hybrid execution#23819
chaxu01 wants to merge 1 commit into
ggml-org:masterfrom
chaxu01:feature/global-queue

Conversation

@chaxu01
Copy link
Copy Markdown
Collaborator

@chaxu01 chaxu01 commented May 28, 2026

Overview

This investigation aims to explore replacing the static weighting model with a dynamic chunk-based scheduling approach, leveraging the recently introduced repack matmul chunking mechanism (PR #16833). The goal is to enable adaptive, runtime-driven work distribution between SME and NEON kernels without relying on hardcoded ratios.

Additional information

Benchmarks from Samsung S26 Exynos — Llama-3.2-1B-Instruct-Q4_0 (pp512)

Threads Global Queue (t/s) Static Quadratic (t/s) Δ (%)
1 292.08 ± 2.02 288.02 ± 3.31 +1.4%
2 303.90 ± 0.63 163.87 ± 2.68 +85.4%
4 430.69 ± 11.35 255.75 ± 16.43 +68.4%
6 450.48 ± 23.96 297.61 ± 9.80 +51.4%
8 499.28 ± 22.37 367.45 ± 9.09 +35.9%
10 489.62 ± 25.13 389.00 ± 8.12 +25.9%

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES, Codex was used as code review assistance

@chaxu01 chaxu01 requested a review from ggerganov as a code owner May 28, 2026 13:30
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant