chimera: a single-executable / library which combines llama.cpp, whisper.cpp, and stable-diffusion.cpp #1543
shakfu
started this conversation in
Show and tell
Replies: 2 comments 3 replies
-
|
Just realized this is similar to a propsal submitted by me a second ago....: |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Thanks @martinbu69 I got to chimera, after working on cyllama, a cython wrapper of the .cpp libraries where I discovered that all three used ggml versions which were quite close. As a possible packaging size optimization, I had the other two libraries drop their own ggml libraries and link to llama.cpp's ggml lib and it worked. Once it became clear that this was not a one off, chimera was the next logical step. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
chimera is a single statically-linked C++ executable that bundles llama.cpp, whisper.cpp, stable-diffusion.cpp, SQLite, and sqlite-vec into a busybox-style multitool. The same binary handles text generation, interactive chat with persistent history, speech-to-text, text-to-image, a personal RAG / vector store, and an OpenAI-compatible HTTP server exposing all three inference capabilities at once — all sharing a single ggml backend set and one SQLite database.
If you want the same capabilities from Python instead of a native binary, see cyllama — chimera's sibling project, which exposes llama.cpp / whisper.cpp / stable-diffusion.cpp as Cython bindings with a high-level Python API.
The same build also produces
libchimera.a, a redistributable static library that hosts the engines and the OpenAI-compatible HTTP server. Other C++ projects can link it directly to embed text generation, embeddings, transcription, image generation, RAG, and the HTTP server inside their own process — without forking chimera or shelling out to thechimerabinary.Who it's for
chimera targets CLI-first users who run more than one ggml-backed modality (text + audio + image) and want them sharing one process, one ggml backend set, one SQLite database, and one OpenAI-compatible HTTP surface — rather than running, configuring, and gluing together three separate servers. It is most useful when:
gen,chat,embedexpose most llama.cpp sampler / RoPE / YaRN / multi-GPU / cache / adapter flags directly), not a curated subset.chimera inforather than runtime probing.libchimera.a(and optionally#include "chimera.hpp"for the persistent-handle OOP layer) gives you the same model lifecycle, sampler wiring, and HTTP-server code paths thechimerabinary uses.Beta Was this translation helpful? Give feedback.
All reactions