Add CPU kernel skills by jiqing-feng · Pull Request #614 · huggingface/kernels

jiqing-feng · 2026-06-03T08:15:38Z

Summary

Adds a cpu-kernels skill for kernel-builder that guides writing, optimizing, and
benchmarking C++ CPU kernels (AVX2/AVX512) for the Hugging Face kernels ecosystem.

What's included

Two-phase workflow: Phase 1 correctness (generic → AVX2 → AVX512), Phase 2
performance exploration with trial tracking and backtracking.
Scripts: op analysis, static validation, benchmarking (torch.utils.benchmark),
perf stat profiling, and trial management.
Reference docs: runtime CPU dispatch, build.toml multi-target compilation,
SIMD patterns, quantized GEMM / brgemm, threading, memory, and correctness constraints.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

github-actions · 2026-06-03T08:15:57Z

Hi @jiqing-feng, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng · 2026-06-05T06:46:48Z

Hi @sywangyi @YangKai0616 . Please take a quick overview. Thanks!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sayakpaul · 2026-06-08T12:49:51Z

Is this PR ready to be reviewed?

jiqing-feng · 2026-06-09T00:57:12Z

Is this PR ready to be reviewed?

Yes, please.

sayakpaul · 2026-06-09T03:31:47Z

 - `cuda-kernels` (default)
 - `rocm-kernels`
 - `xpu-kernels`
+- `cpu-kernels`


It would be nice to add a note on where CPU kernels are actually helpful.

Done. Please review the new changes and rerun the CI. Thanks!

HuggingFaceDocBuilderDev · 2026-06-09T03:38:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sayakpaul · 2026-06-09T05:33:24Z

+> [!TIP]
+> **When are CPU kernels actually helpful?** Two main cases:
+> - **Better performance on Intel Xeon** — custom AVX2/AVX512 kernels (and AMX via brgemm for quantized GEMM) outperform generic PyTorch ops for element-wise and quantized workloads, especially in CPU-only or latency-sensitive serving.
+> - **Enabling functionality that otherwise can't run** — some kernels are a hard requirement, e.g. `megablocks` MoE on CPU, where without the kernel you simply cannot run MXFP4.


Nice! Can you provide some example kernels that you have built for CPU?

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sayakpaul · 2026-06-09T06:09:33Z

Failing tests are unrelated. Thanks for your contributions.

jiqing-feng added 3 commits June 2, 2026 00:53

add cpu kernel skills

64fa4de

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update

467a0f9

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update

2078b8c

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as draft June 3, 2026 08:15

github-actions Bot closed this Jun 3, 2026

update

80c064b

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

danieldk reopened this Jun 4, 2026

jiqing-feng added 3 commits June 4, 2026 14:44

Merge branch 'main' into cpu_skill

27fe4e5

update

7696fb2

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

update

aa6a03d

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng marked this pull request as ready for review June 5, 2026 06:46

Merge branch 'main' into cpu_skill

9ab3cb7

sywangyi reviewed Jun 8, 2026

View reviewed changes

Comment thread kernel-builder/skills/cpu-kernels/SKILL.md

jiqing-feng added 2 commits June 7, 2026 22:31

update

0b407ad

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

Merge branch 'main' into cpu_skill

ba4f61f

Merge branch 'main' into cpu_skill

1fa4994

sayakpaul previously approved these changes Jun 9, 2026

View reviewed changes

update

2a84e74

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

jiqing-feng dismissed sayakpaul’s stale review via 2a84e74 June 9, 2026 05:01

fix style

7cb6da9

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sayakpaul reviewed Jun 9, 2026

View reviewed changes

add examples

1d4c374

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

sayakpaul approved these changes Jun 9, 2026

View reviewed changes

sayakpaul merged commit e009a36 into huggingface:main Jun 9, 2026
38 of 40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CPU kernel skills#614

Add CPU kernel skills#614
sayakpaul merged 14 commits into
huggingface:mainfrom
jiqing-feng:cpu_skill

jiqing-feng commented Jun 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

jiqing-feng commented Jun 5, 2026

Uh oh!

Uh oh!

sayakpaul commented Jun 8, 2026

Uh oh!

jiqing-feng commented Jun 9, 2026

Uh oh!

sayakpaul Jun 9, 2026

Uh oh!

jiqing-feng Jun 9, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jun 9, 2026

Uh oh!

sayakpaul Jun 9, 2026

Uh oh!

jiqing-feng Jun 9, 2026

Uh oh!

sayakpaul commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jiqing-feng commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

jiqing-feng commented Jun 5, 2026

Uh oh!

Uh oh!

sayakpaul commented Jun 8, 2026

Uh oh!

jiqing-feng commented Jun 9, 2026

Uh oh!

sayakpaul Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jun 9, 2026

Uh oh!

sayakpaul Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jiqing-feng commented Jun 3, 2026 •

edited

Loading