Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions gallery/index.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,58 @@
---
- name: "gemma-4-12b-it-qat"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF
description: |
Hugging Face |
GitHub |
Launch Blog |
Documentation

License: Apache 2.0 | Authors: Google DeepMind

> [!Note]
> This model card is for the new versions of the Gemma 4 family optimized with Quantization-Aware Training (QAT), which allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model.
> Four versions of the QAT checkpoints are available:
> * **Unquantized QAT checkpoints** (Q4_0): Half-precision weights extracted from the QAT pipeline, ideal for custom downstream compilation and research. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B, and their drafter models.
> * **GGUF** (Q4_0): Ready-to-deploy formats for broad ecosystem compatibility. Available for Gemma 4 E2B, E4B, 12B, 26B A4B, and 31B.
> * **Mobile-optimized** (wNa8o8): A custom schema engineered explicitly for mobile hardware efficiency. It features targeted 2-bit decoding layers, optimized KV caches, and static activations to maximize VRAM savings. Available for Gemma 4 E2B and E4B.
> * **Compressed Tensors** (w4a16): QAT checkpoints serialized in the compressed-tensors format for native, optimized inference with vLLM. Available for Gemma 4 E2B, E4B, 12B

...
license: "apache-2.0"
tags:
- llm
- gguf
- gemma
icon: https://ai.google.dev/gemma/images/gemma4_banner.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
mmproj: llama-cpp/mmproj/gemma-4-12B-it-qat-GGUF/mmproj-F32.gguf
options:
- use_jinja:true
parameters:
min_p: 0
model: llama-cpp/models/gemma-4-12B-it-qat-GGUF/mtp-gemma-4-12B-it.gguf
repeat_penalty: 1
temperature: 1
top_k: 64
top_p: 0.95
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/gemma-4-12B-it-qat-GGUF/mtp-gemma-4-12B-it.gguf
sha256: 3c9460bc9f6143f232c9447bc772bf6f6db974b57129298381cf0cb702bd586b
uri: https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF/resolve/main/mtp-gemma-4-12B-it.gguf
- filename: llama-cpp/mmproj/gemma-4-12B-it-qat-GGUF/mmproj-F32.gguf
sha256: 52a28ae82a96fa4654a4d871fa03a69de0fba0fdb429ae592f9bda3ede71d7be
uri: https://huggingface.co/unsloth/gemma-4-12B-it-qat-GGUF/resolve/main/mmproj-F32.gguf
- name: "gemma-4-26b-a4b-it-qat"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
Expand Down
Loading