Skip to content

Sync zDNN branch lineage#23799

Open
jrepp wants to merge 28 commits into
ggml-org:masterfrom
jrepp:zdnn-continued
Open

Sync zDNN branch lineage#23799
jrepp wants to merge 28 commits into
ggml-org:masterfrom
jrepp:zdnn-continued

Conversation

@jrepp
Copy link
Copy Markdown

@jrepp jrepp commented May 28, 2026

Summary

  • advance the zDNN continuation branch with the latest validated backend work
  • include the Python 3.14 tool benchmark pin and merged zDNN backend updates

Testing

  • not run in this superproject session

jrepp and others added 28 commits December 23, 2025 15:47
- MUL_MAT requires PARMBLKFORMAT_1 (z16+) for zdnn_matmul_transpose_op
- GELU requires NNPA function code 53 (z16+)
- Add runtime check and clear error message for z15 users
- Operations will fall back to CPU on z15 instead of crashing

This allows the zDNN backend to work on z15 with reduced functionality,
while fully utilizing z16+ capabilities when available.
Integer types (I8, I32) cannot be transformed by zDNN and should not
be marked as native types. Only F32, F16, and BF16 can be transformed
and used with NNPA operations.

Also removes Q8_0 from native types - it's a quantized type that should
be dequantized to F32 rather than treated as native INT8.

This fixes crashes when running models that use integer indices for
operations like embedding lookups (GET_ROWS).
The IBM zDNN library requires output tensors to be reset via
zdnn_reset_ztensor() before writing if they're already transformed.
Without this reset, operations like zdnn_rmsnorm return ZDNN_INVALID_STATE.

This fixes the error:
"ZDNN_INVALID_STATE: Attempted to transform data into a tensor that
is already transformed"

Added reset calls before all operations that write to destination tensors:
- elementwise.cpp: add, mul, sub, div, softmax, rms_norm
- unary.cpp: gelu, relu, tanh, sigmoid, exp, neg, sqrt, log, silu, leaky_relu
- mmf.cpp: matmul_transpose_op
Add init_raw_ztensor helper to create lightweight ztensor wrappers
for raw data operations (get_rows, rope). These wrappers provide
type safety while avoiding memory allocation overhead.

Updated functions:
- ggml_zdnn_get_rows: Uses ztensor wrappers for embedding lookup
- ggml_zdnn_rope: Uses ztensor wrappers for rotary position embedding
- MUL_MAT requires PARMBLKFORMAT_1 (z16+) for zdnn_matmul_transpose_op
- GELU requires NNPA function code 53 (z16+)
- Add runtime check and clear error message for z15 users
- Operations will fall back to CPU on z15 instead of crashing

This allows the zDNN backend to work on z15 with reduced functionality,
while fully utilizing z16+ capabilities when available.
Integer types (I8, I32) cannot be transformed by zDNN and should not
be marked as native types. Only F32, F16, and BF16 can be transformed
and used with NNPA operations.

Also removes Q8_0 from native types - it's a quantized type that should
be dequantized to F32 rather than treated as native INT8.

This fixes crashes when running models that use integer indices for
operations like embedding lookups (GET_ROWS).
The IBM zDNN library requires output tensors to be reset via
zdnn_reset_ztensor() before writing if they're already transformed.
Without this reset, operations like zdnn_rmsnorm return ZDNN_INVALID_STATE.

This fixes the error:
"ZDNN_INVALID_STATE: Attempted to transform data into a tensor that
is already transformed"

Added reset calls before all operations that write to destination tensors:
- elementwise.cpp: add, mul, sub, div, softmax, rms_norm
- unary.cpp: gelu, relu, tanh, sigmoid, exp, neg, sqrt, log, silu, leaky_relu
- mmf.cpp: matmul_transpose_op
Add init_raw_ztensor helper to create lightweight ztensor wrappers
for raw data operations (get_rows, rope). These wrappers provide
type safety while avoiding memory allocation overhead.

Updated functions:
- ggml_zdnn_get_rows: Uses ztensor wrappers for embedding lookup
- ggml_zdnn_rope: Uses ztensor wrappers for rotary position embedding
@jrepp jrepp requested a review from a team as a code owner May 28, 2026 05:22
@ggml-gh-bot
Copy link
Copy Markdown

ggml-gh-bot Bot commented May 28, 2026

Hi @jrepp, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@taronaeo
Copy link
Copy Markdown
Member

Thanks for contributing to the IBM zDNN backend. There are a lot of changes here and would take time to review.

Can you also include the following details in your PR description?

  1. llama-bench results master vs this PR
  2. test-backend-ops results only with the zDNN backend
  3. Declare whether AI was used as per the contributing guidelines.

cc: @AlekseiNikiforovIBM @Andreas-Krebbel

@github-actions github-actions Bot added script Script related python python script changes ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator labels May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator python python script changes script Script related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants