Production-grade speech-to-text platform built on WhisperLive.
Aavaaz (आवाज़, "voice" in Hindi) is an open source extension of WhisperLive with enterprise features that compete with Deepgram, ElevenLabs, and AssemblyAI.
| Category | Capabilities |
|---|---|
| Transcription | Real-time WebSocket streaming, REST API (OpenAI-compatible), batch inference, multichannel audio |
| Intelligence | Speaker diarization, sentiment analysis, topic detection, entity extraction, summarization |
| Post-processing | Smart formatting, PII redaction, profanity filtering, noise reduction, utterance/paragraph segmentation |
| Platform | Webhook delivery, transcript search & tagging, storage backends (local/S3), ACL/auth, GDPR compliance, Prometheus metrics |
| Deployment | Docker, Helm charts, Terraform (AWS), serverless (Lambda), Modal (GPU), GPU auto-detection, model caching, SSE streaming |
# Install uv: https://docs.astral.sh/uv/
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone WhisperLive alongside aavaaz (required — aavaaz uses a patched fork)
git clone https://github.com/collabora/WhisperLive.git ../WhisperLive
# Create venv with Python 3.12 (required; 3.14 not yet supported by ML deps)
uv venv .venv --python python3.12
source .venv/bin/activate
# Install aavaaz with ML stack (whisper-live, torch, etc)
uv sync --extra whisper
# Start the server
aavaaz serve --model large-v3
# Transcribe a file
aavaaz transcribe audio.wavFedora 43+ / Python 3.14 note: The ML stack (PyTorch, faster-whisper) does not yet publish wheels for Python 3.14. Use
python3.12explicitly when creating the virtualenv. On Fedora:sudo dnf install python3.12
# Create a virtualenv (Python 3.12 required)
python3.12 -m venv .venv && source .venv/bin/activate
# Clone WhisperLive (patched fork required by aavaaz)
git clone https://github.com/collabora/WhisperLive.git ../WhisperLive
pip install -e ../WhisperLive
# Install base + ML stack (large ~20GB download for torch/onnx)
pip install -r requirements/whisper.txt
# Or install just base (fast, no ML):
# pip install -r requirements/base.txt
# Start the server (requires ML stack)
aavaaz serve --model large-v3
# Transcribe a file
aavaaz transcribe audio.wav
# OpenAI-compatible REST endpoint
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav -F model=large-v3python3.12 -m venv .venv && source .venv/bin/activate
# Clone WhisperLive (patched fork required by aavaaz)
git clone https://github.com/collabora/WhisperLive.git ../WhisperLive
pip install -e ../WhisperLive
# Install base only
pip install -e .
# Or install with whisper-live + dev tools
pip install -e ".[whisper,dev]"The full ML stack (torch, onnxruntime, torchaudio) requires ~20GB disk space. If you hit disk quota errors, consider:
- Using
uvwhich is faster and handles large downloads better - Installing on a machine with more space
- Using serverless deployments (AWS Lambda / Modal) instead of local
requirements/base.txt— Core dependencies only (fastapi, uvicorn, boto3)requirements/whisper.txt— Full ML stack (torch, whisper-live, etc)requirements/dev.txt— Development tools (pytest, ruff, etc)
Aavaaz uses WhisperLive as its transcription engine and extends it via the plugin system:
┌─────────────────────────────────────────┐
│ Aavaaz Server │
│ ┌─────────────────────────────────┐ │
│ │ REST API / WebSocket / Web UI │ │
│ └──────────────┬──────────────────┘ │
│ ┌──────────────┴──────────────────┐ │
│ │ Plugin Pipeline │ │
│ │ diarization → formatting → │ │
│ │ PII redaction → intelligence │ │
│ └──────────────┬──────────────────┘ │
│ ┌──────────────┴──────────────────┐ │
│ │ WhisperLive Core Engine │ │
│ │ faster-whisper / TensorRT / │ │
│ │ OpenVINO │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
Enable per-word timing and confidence scores in transcription segments:
from aavaaz import AavaazServer
server = AavaazServer()
server.serve(word_timestamps=True)When enabled, each segment includes a words array:
{
"segments": [{
"start": "0.000", "end": "2.500", "text": "Hello world",
"words": [
{"word": "Hello", "start": "0.000", "end": "0.800", "probability": 0.95},
{"word": " world", "start": "0.900", "end": "2.500", "probability": 0.88}
]
}]
}Boost recognition of specific terms (product names, acronyms, domain jargon):
from aavaaz import AavaazServer
server = AavaazServer()
server.serve(hotwords="Aavaaz,TensorRT,OpenVINO")The hotwords parameter is a comma-separated string passed directly to faster-whisper's keyword boosting. Also available in the REST API via the hotwords form field.
Real-time speaker identification using pyannote.audio embeddings:
pip install pyannote.audiofrom aavaaz import AavaazServer
server = AavaazServer()
server.serve(enable_diarization=True, max_speakers=4)When enabled, completed segments include a speaker field:
{"start": "0.000", "end": "2.500", "text": "Hello", "speaker": "SPEAKER_00", "completed": true}Protect both REST API and WebSocket connections with a shared API key:
aavaaz serve --model large-v3 --api-key "my-secret-key"- REST API: Requires
Authorization: Bearer my-secret-keyheader - WebSocket: Requires either
Authorization: Bearer my-secret-keyheader or?token=my-secret-keyquery parameter
Unauthenticated connections receive HTTP 401 before any GPU resources are allocated.
Limit REST API requests per client IP (sliding 60-second window):
aavaaz serve --model large-v3 --rate-limit-rpm 60Clients exceeding the limit receive HTTP 429.
Automatically reconnect when the WebSocket connection drops unexpectedly:
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost", 9090,
max_retries=5,
retry_delay=3,
)Batch multiple client sessions into single GPU calls for higher throughput:
aavaaz serve --model large-v3 --batch-inference --batch-max-size 8 --batch-window-ms 50Monitor server health with a Prometheus /metrics endpoint:
aavaaz serve --model large-v3 --metrics-port 9091Tracks active connections, transcription latency, segment counts, and error rates.
Stream transcription results via Server-Sent Events from the REST API:
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav -F stream=trueReturns real-time segment events as text/event-stream.
Extend the transcription pipeline with custom post-processors:
from aavaaz.plugins import PluginRegistry
registry = PluginRegistry()
registry.register("my_plugin", my_post_processor_fn, priority=50)
server = AavaazServer(plugin_registry=registry)
server.serve()Plugins receive each transcription segment and can modify, enrich, or filter it before delivery to the client.
aavaaz serve --model large-v3 --batch-inferenceservices:
aavaaz:
image: collabora/aavaaz:latest
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
ports:
- "9090:9090"
- "8000:8000"helm install aavaaz deploy/helm/aavaaz \
--set model=large-v3 \
--set replicas=3 \
--set gpu.enabled=truecd deploy/terraform
terraform init
terraform apply -var="model=large-v3" -var="api_key=my-secret"Provisions VPC, ALB, ECS with GPU instances (g5.xlarge), ECR, and CloudWatch. See deploy/terraform/README.md for full options.
For batch file transcription without managing servers:
# Build and push the Lambda container image
docker build -f Dockerfile.lambda --build-arg WHISPER_MODEL=small.en -t aavaaz-lambda .
# Deploy infrastructure
cd deploy/terraform-lambda
terraform init
terraform apply
# Upload audio — transcript appears automatically in the output bucket
aws s3 cp recording.wav s3://$(terraform output -raw audio_input_bucket)/
# Or use the REST API
curl -X POST $(terraform output -raw api_endpoint) \
-H "Content-Type: application/json" \
-d '{"audio_url": "s3://my-bucket/recording.wav"}'See docs/SERVERLESS.md for full configuration, model selection, cost estimates, and limitations.
Deploy on Modal for on-demand GPU transcription with zero infrastructure:
cd deploy/modal
pip install modal
modal setup
modal deploy app.py
# Transcribe
curl -X POST https://your-workspace--aavaaz-transcribe.modal.run/v1/audio/transcriptions \
-F file=@recording.wav -F model=large-v3Auto-scales to zero when idle, GPU containers spin up in seconds. See docs/MODAL.md for full configuration.
git clone git@github.com:collabora/aavaaz.git
cd aavaaz
pip install -e ".[dev]"
pytest