feat: support Granite-Docling model #2109

dhdaines · 2026-01-04T05:35:57Z

Applies on top of #2108 which has the necessary changes to MTMD.

This adds a chat format to support https://huggingface.co/ggml-org/granite-docling-258M-GGUF and derivatives. It should work with SmolVLM and SmolDocling as well.

In order to use these models effectively it is necessary to enable special tokens in the chat completion output, so this adds a special flag to all of the chat completion functions which matches what --special does in llama-cli (this is enabled by default in llama-mtmd-cli)

dhdaines · 2026-01-04T18:07:21Z

Ready for review!

- Update vendor/llama.cpp submodule to commit be47fb92 (2026-01-01) - Bump version 0.3.16 -> 0.4.0 Critical fixes: - Remove phantom flash_attn field from llama_context_params (caused segfaults) - Add 3 missing params to llama_params_fit (margin, n_ctx_min, log_level) - Migrate flash_attn bool -> flash_attn_type enum (BREAKING CHANGE) - Add flash_attn_type to TYPE_CHECKING block - Fix test: use flash_attn_type instead of removed flash_attn field - FIX CRITICAL: kv_cache_seq_rm must preserve seq_id=-1 semantics (all sequences) * The wrapper was incorrectly converting -1 to 0, breaking context rewind * This caused 'discontinuity' errors on multi-turn conversations API changes: - flash_attn: bool field REMOVED from structs - flash_attn_type: int enum ADDED (AUTO=-1, DISABLED=0, ENABLED=1) - High-level API maintains backward compatibility via wrapper - Server default changed: flash_attn=False -> flash_attn=None (AUTO mode) New features: - 20+ new functions (memory API, state management, samplers, vocab queries) - 5 new enums (flash_attn_type, params_fit_status, model_meta_key, etc.) - 6 new struct fields across llama_model_params, llama_context_params, mtmd_context_params Deprecated removals: - 11 llama_kv_self_* functions (replaced by llama_memory_*) - llama_sampler_init_softmax - verbosity field from mtmd_context_params

dhdaines force-pushed the granite-docling branch from cdcadc5 to 03cf904 Compare January 4, 2026 18:04

dhdaines marked this pull request as ready for review January 4, 2026 18:05

dhdaines force-pushed the granite-docling branch 3 times, most recently from 3a04e21 to 8790ce6 Compare January 6, 2026 19:55

Ralf Waldukat and others added 4 commits January 13, 2026 09:47

feat: support Granite-Docling model

ef832d6

feat: add special argument needed to make Granite-Docling useful

4190c19

feat: add special to all formatters/completers

9cd71ba

dhdaines force-pushed the granite-docling branch from 8790ce6 to 9cd71ba Compare January 13, 2026 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support Granite-Docling model #2109

feat: support Granite-Docling model #2109

Uh oh!

dhdaines commented Jan 4, 2026 •

edited

Loading

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: support Granite-Docling model #2109

Are you sure you want to change the base?

feat: support Granite-Docling model #2109

Uh oh!

Conversation

dhdaines commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dhdaines commented Jan 4, 2026 •

edited

Loading