Sync zDNN branch lineage by jrepp · Pull Request #23799 · ggml-org/llama.cpp

jrepp · 2026-05-28T05:22:47Z

Summary

advance the zDNN continuation branch with the latest validated backend work
include the Python 3.14 tool benchmark pin and merged zDNN backend updates

Testing

not run in this superproject session

- MUL_MAT requires PARMBLKFORMAT_1 (z16+) for zdnn_matmul_transpose_op - GELU requires NNPA function code 53 (z16+) - Add runtime check and clear error message for z15 users - Operations will fall back to CPU on z15 instead of crashing This allows the zDNN backend to work on z15 with reduced functionality, while fully utilizing z16+ capabilities when available.

Integer types (I8, I32) cannot be transformed by zDNN and should not be marked as native types. Only F32, F16, and BF16 can be transformed and used with NNPA operations. Also removes Q8_0 from native types - it's a quantized type that should be dequantized to F32 rather than treated as native INT8. This fixes crashes when running models that use integer indices for operations like embedding lookups (GET_ROWS).

The IBM zDNN library requires output tensors to be reset via zdnn_reset_ztensor() before writing if they're already transformed. Without this reset, operations like zdnn_rmsnorm return ZDNN_INVALID_STATE. This fixes the error: "ZDNN_INVALID_STATE: Attempted to transform data into a tensor that is already transformed" Added reset calls before all operations that write to destination tensors: - elementwise.cpp: add, mul, sub, div, softmax, rms_norm - unary.cpp: gelu, relu, tanh, sigmoid, exp, neg, sqrt, log, silu, leaky_relu - mmf.cpp: matmul_transpose_op

Add init_raw_ztensor helper to create lightweight ztensor wrappers for raw data operations (get_rows, rope). These wrappers provide type safety while avoiding memory allocation overhead. Updated functions: - ggml_zdnn_get_rows: Uses ztensor wrappers for embedding lookup - ggml_zdnn_rope: Uses ztensor wrappers for rotary position embedding

- MUL_MAT requires PARMBLKFORMAT_1 (z16+) for zdnn_matmul_transpose_op - GELU requires NNPA function code 53 (z16+) - Add runtime check and clear error message for z15 users - Operations will fall back to CPU on z15 instead of crashing This allows the zDNN backend to work on z15 with reduced functionality, while fully utilizing z16+ capabilities when available.

Integer types (I8, I32) cannot be transformed by zDNN and should not be marked as native types. Only F32, F16, and BF16 can be transformed and used with NNPA operations. Also removes Q8_0 from native types - it's a quantized type that should be dequantized to F32 rather than treated as native INT8. This fixes crashes when running models that use integer indices for operations like embedding lookups (GET_ROWS).

The IBM zDNN library requires output tensors to be reset via zdnn_reset_ztensor() before writing if they're already transformed. Without this reset, operations like zdnn_rmsnorm return ZDNN_INVALID_STATE. This fixes the error: "ZDNN_INVALID_STATE: Attempted to transform data into a tensor that is already transformed" Added reset calls before all operations that write to destination tensors: - elementwise.cpp: add, mul, sub, div, softmax, rms_norm - unary.cpp: gelu, relu, tanh, sigmoid, exp, neg, sqrt, log, silu, leaky_relu - mmf.cpp: matmul_transpose_op

Add init_raw_ztensor helper to create lightweight ztensor wrappers for raw data operations (get_rows, rope). These wrappers provide type safety while avoiding memory allocation overhead. Updated functions: - ggml_zdnn_get_rows: Uses ztensor wrappers for embedding lookup - ggml_zdnn_rope: Uses ztensor wrappers for rotary position embedding

…tinued

ggml-gh-bot · 2026-05-28T05:27:33Z

Hi @jrepp, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

Large PR: Large changes require prior discussion (e.g. an issue or RFC) and maintainers may not be able to review this PR as-is. Consider splitting it into smaller, focused PRs.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

taronaeo · 2026-05-28T06:38:04Z

Thanks for contributing to the IBM zDNN backend. There are a lot of changes here and would take time to review.

Can you also include the following details in your PR description?

llama-bench results master vs this PR
test-backend-ops results only with the zDNN backend
Declare whether AI was used as per the contributing guidelines.

cc: @AlekseiNikiforovIBM @Andreas-Krebbel

jrepp and others added 28 commits December 23, 2025 15:47

ggml-zdnn: add ADD, MUL, SOFT_MAX ops

8920143

ggml-zdnn: add GELU, RELU, TANH, SIGMOID ops

26d7a67

ggml-zdnn: add SUB, DIV, SQRT, LOG, SILU, NEG ops

b0c9069

ggml-zdnn: add RMS_NORM, GET_ROWS ops

5db3d83

ggml-zdnn: add ROPE, CONT, CPY, view tensors, quantized weights

e647a16

Fix cmake to find zdnn.h in ZDNN_ROOT/zdnn directory

aaf49b4

Add diagnostic error message to ZDNN_CHECK macro

7caedf5

Add debug logging to ggml_zdnn_rope

cbd5562

Remove debug logging from ggml_zdnn_rope

1a04abc

ggml-zdnn: add ADD, MUL, SOFT_MAX ops

5213979

ggml-zdnn: add GELU, RELU, TANH, SIGMOID ops

4949d40

ggml-zdnn: add SUB, DIV, SQRT, LOG, SILU, NEG ops

3b270ca

ggml-zdnn: add RMS_NORM, GET_ROWS ops

b006fb2

ggml-zdnn: add ROPE, CONT, CPY, view tensors, quantized weights

7249783

Fix cmake to find zdnn.h in ZDNN_ROOT/zdnn directory

cb3117a

Add diagnostic error message to ZDNN_CHECK macro

72321f7

Add debug logging to ggml_zdnn_rope

7cce0b4

Remove debug logging from ggml_zdnn_rope

62c8707

chore(scripts): pin tool benchmark to python 3.14

0913d74

Merge branch 'z/submodule-pin-python314-tool-benchmark' into zdnn-con…

94b0855

…tinued

jrepp requested a review from a team as a code owner May 28, 2026 05:22

github-actions Bot added script Script related python python script changes ggml changes relating to the ggml tensor library for machine learning IBM zDNN issues specific to IBM zDNN Accelerator labels May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync zDNN branch lineage#23799

Sync zDNN branch lineage#23799
jrepp wants to merge 28 commits into
ggml-org:masterfrom
jrepp:zdnn-continued

jrepp commented May 28, 2026

Uh oh!

ggml-gh-bot Bot commented May 28, 2026

Uh oh!

taronaeo commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrepp commented May 28, 2026

Summary

Testing

Uh oh!

ggml-gh-bot Bot commented May 28, 2026

Uh oh!

taronaeo commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants