Skip to content

Add CPU kernel skills#614

Merged
sayakpaul merged 14 commits into
huggingface:mainfrom
jiqing-feng:cpu_skill
Jun 9, 2026
Merged

Add CPU kernel skills#614
sayakpaul merged 14 commits into
huggingface:mainfrom
jiqing-feng:cpu_skill

Conversation

@jiqing-feng

@jiqing-feng jiqing-feng commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a cpu-kernels skill for kernel-builder that guides writing, optimizing, and
benchmarking C++ CPU kernels (AVX2/AVX512) for the Hugging Face kernels ecosystem.

What's included

  • Two-phase workflow: Phase 1 correctness (generic → AVX2 → AVX512), Phase 2
    performance exploration with trial tracking and backtracking.
  • Scripts: op analysis, static validation, benchmarking (torch.utils.benchmark),
    perf stat profiling, and trial management.
  • Reference docs: runtime CPU dispatch, build.toml multi-target compilation,
    SIMD patterns, quantized GEMM / brgemm, threading, memory, and correctness constraints.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as draft June 3, 2026 08:15
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Hi @jiqing-feng, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

@github-actions github-actions Bot closed this Jun 3, 2026
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@danieldk danieldk reopened this Jun 4, 2026
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review June 5, 2026 06:46
@jiqing-feng

Copy link
Copy Markdown
Contributor Author

Hi @sywangyi @YangKai0616 . Please take a quick overview. Thanks!

Comment thread kernel-builder/skills/cpu-kernels/SKILL.md
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@sayakpaul

Copy link
Copy Markdown
Member

Is this PR ready to be reviewed?

@jiqing-feng

Copy link
Copy Markdown
Contributor Author

Is this PR ready to be reviewed?

Yes, please.

sayakpaul
sayakpaul previously approved these changes Jun 9, 2026
Comment thread docs/source/cli-skills.md
- `cuda-kernels` (default)
- `rocm-kernels`
- `xpu-kernels`
- `cpu-kernels`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a note on where CPU kernels are actually helpful.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please review the new changes and rerun the CI. Thanks!

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Comment thread docs/source/cli-skills.md
Comment on lines +14 to +17
> [!TIP]
> **When are CPU kernels actually helpful?** Two main cases:
> - **Better performance on Intel Xeon** — custom AVX2/AVX512 kernels (and AMX via brgemm for quantized GEMM) outperform generic PyTorch ops for element-wise and quantized workloads, especially in CPU-only or latency-sensitive serving.
> - **Enabling functionality that otherwise can't run** — some kernels are a hard requirement, e.g. `megablocks` MoE on CPU, where without the kernel you simply cannot run MXFP4.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Can you provide some example kernels that you have built for CPU?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@sayakpaul

Copy link
Copy Markdown
Member

Failing tests are unrelated. Thanks for your contributions.

@sayakpaul sayakpaul merged commit e009a36 into huggingface:main Jun 9, 2026
38 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants