server: support image+text input for embeddings (Qwen3-VL-Embedding) by ngxson · Pull Request #18665 · ggml-org/llama.cpp

ngxson · 2026-01-07T12:56:16Z

Target support: https://huggingface.co/Qwen/Qwen3-VL-Embedding-2B

Important

the original Qwen3-VL-Embedding model is missing 1_Pooling, I don't think it's actually ready to be used unless Qwen team fixed it (I already reached out to them, but got no responses)

But currently, the model is missing 1_Pooling, so it cannot be correctly converted to GGUF

This PR aims to support mixed text+image (and maybe audio input for models supporting it) using OAI-compat content-like schema:

{
    "input": [
        {
            "type": "text",
            "text": "mixed text and image input"
        },
        {
            "type": "image",
            "image_url": {
                "url": "https://huggingface.co/ggml-org/tinygemma3-GGUF/resolve/main/test/11_truck.png"
            }
        }
    ]
}

ggerganov · 2026-01-07T13:03:00Z

When you convert the model, try to add --sentence-transformers-dense-modules:

llama.cpp/convert_hf_to_gguf.py

Lines 10974 to 10981 in 294b2b4

    
           parser.add_argument( 
        
               "--sentence-transformers-dense-modules", action="store_true", 
        
               help=("Whether to include sentence-transformers dense modules." 
        
                     "It can be used for sentence-transformers models, like google/embeddinggemma-300m" 
        
                     "Default these modules are not included.") 
        
           )

CISC · 2026-01-07T13:08:55Z

IIRC they forgot to add 1_Pooling initially on other embedding models too, since this one is not public yet maybe ask about it?

ngxson · 2026-01-07T13:10:31Z

Oh sorry I didn't notice that it's private 😅 temporary closing this to keep it under the radar

Tokimorphling · 2026-03-28T07:02:46Z

I've fixed the Qwen3-VL-Embedding issues in llama.cpp and verified the fix with regression tests. Check out the code here: https://github.com/Tokimorphling/qwen3-vl-embedding

The implementation of the Qwen3-VL series in llama.cpp seems to be problematic.

server: support image+text input for embeddings (Qwen3-VL-Embedding)

56a0d87

ngxson changed the title ~~server: support image+text input for embeddings (Qwen3-VL-Embedding)~~ server: support image+text input for embeddings Jan 7, 2026

ngxson closed this Jan 7, 2026

github-actions Bot added examples server labels Jan 7, 2026

ngxson reopened this Jan 8, 2026

ngxson changed the title ~~server: support image+text input for embeddings~~ server: support image+text input for embeddings (Qwen3-VL-Embedding) Jan 8, 2026

alpaim mentioned this pull request Jan 9, 2026

Qwen3 VL Embedding integration for LLama.cpp backend alpaim/vecDir#41

Draft

ngxson closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: support image+text input for embeddings (Qwen3-VL-Embedding)#18665

server: support image+text input for embeddings (Qwen3-VL-Embedding)#18665
ngxson wants to merge 1 commit intoggml-org:masterfrom
ngxson:xsn/qwen3_vl_embd

ngxson commented Jan 7, 2026 •

edited

Loading

Uh oh!

ggerganov commented Jan 7, 2026

Uh oh!

CISC commented Jan 7, 2026

Uh oh!

ngxson commented Jan 7, 2026 •

edited

Loading

Uh oh!

Tokimorphling commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ngxson commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jan 7, 2026

Uh oh!

CISC commented Jan 7, 2026

Uh oh!

ngxson commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tokimorphling commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ngxson commented Jan 7, 2026 •

edited

Loading

ngxson commented Jan 7, 2026 •

edited

Loading