This guide explains how to build and configure whisper.cpp for offline voice message transcription — no API keys or cloud services required.
When VOICE_PROVIDER=local the bot transcribes Telegram voice messages entirely on your machine using:
| Component | Purpose |
|---|---|
| ffmpeg | Converts Telegram OGG/Opus audio to 16 kHz mono WAV |
| whisper.cpp | Runs OpenAI's Whisper model locally via optimised C/C++ |
| GGML model | Quantised model weights (downloaded once) |
- A C/C++ toolchain (
gcc/clang,cmake,make) ffmpeginstalled and on PATH- ~400 MB disk space for the
basemodel (~1.5 GB formedium)
sudo apt update && sudo apt install -y ffmpegbrew install ffmpegapk add ffmpegVerify:
ffmpeg -version# Clone the repository
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
# Build with CMake (recommended)
cmake -B build
cmake --build build --config Release
# The binary is at build/bin/whisper-cli (or build/bin/main on older versions)
ls build/bin/whisper-cliTip: For GPU acceleration add
-DWHISPER_CUBLAS=ON(NVIDIA) or-DWHISPER_METAL=ON(Apple Silicon) to the cmake configure step.
sudo cp build/bin/whisper-cli /usr/local/bin/whisper-cppOr add the build directory to your PATH:
export PATH="$PWD/build/bin:$PATH"Models are hosted on Hugging Face. Pick one based on your hardware:
| Model | Size | RAM (approx.) | Quality |
|---|---|---|---|
tiny |
~75 MB | ~400 MB | Fast but lower accuracy |
base |
~142 MB | ~500 MB | Good balance (default) |
small |
~466 MB | ~1 GB | Better accuracy |
medium |
~1.5 GB | ~2.5 GB | High accuracy |
large-v3 |
~3 GB | ~5 GB | Best accuracy, slow on CPU |
# Create the model cache directory
mkdir -p ~/.cache/whisper-cpp
# Download the base model (recommended starting point)
curl -L -o ~/.cache/whisper-cpp/ggml-base.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin
# Or download small for better accuracy
curl -L -o ~/.cache/whisper-cpp/ggml-small.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.binAdd the following to your .env:
# Enable voice transcription with local provider
ENABLE_VOICE_MESSAGES=true
VOICE_PROVIDER=local
# Path to the whisper.cpp binary (omit if already on PATH as "whisper-cpp")
WHISPER_CPP_BINARY_PATH=/usr/local/bin/whisper-cpp
# Model: a name like "base", "small", "medium" or a full file path
# Named models resolve to ~/.cache/whisper-cpp/ggml-{name}.bin
WHISPER_CPP_MODEL_PATH=baseIf whisper-cpp is on your PATH and you downloaded the base model to the default location, you only need:
VOICE_PROVIDER=local# Test ffmpeg conversion
ffmpeg -f lavfi -i "sine=frequency=440:duration=2" -ar 16000 -ac 1 /tmp/test.wav -y
# Test whisper.cpp
whisper-cpp -m ~/.cache/whisper-cpp/ggml-base.bin -f /tmp/test.wav --no-timestampsYou should see a transcription attempt (it will be empty or nonsensical for a sine wave, but the binary should run without errors).
The bot could not locate the binary. Either:
- Install it system-wide:
sudo cp build/bin/whisper-cli /usr/local/bin/whisper-cpp - Or set the full path:
WHISPER_CPP_BINARY_PATH=/path/to/whisper-cli
The model file does not exist at the expected path. Download it:
mkdir -p ~/.cache/whisper-cpp
curl -L -o ~/.cache/whisper-cpp/ggml-base.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binInstall ffmpeg for your platform (see step 1 above).
- Try a larger model (
smallormediuminstead ofbase) - Ensure audio is not too short (< 1 second) or too noisy
- whisper.cpp uses
--language autoby default; this works well for most languages
- Use a smaller model (
tinyorbase) - Enable GPU acceleration when building whisper.cpp (CUDA / Metal)
- Consider using the
mistraloropenaicloud providers for faster results on low-powered machines