Skip to main content

Aavaaz — production-grade speech-to-text platform built on WhisperLive

Project description

Aavaaz

Production-grade speech-to-text platform built on WhisperLive.

Aavaaz (आवाज़, "voice" in Hindi) is an open source extension of WhisperLive with enterprise features that compete with Deepgram, ElevenLabs, and AssemblyAI.

Features

Category Capabilities
Transcription Real-time WebSocket streaming, REST API (OpenAI-compatible), batch inference, multichannel audio
Intelligence Speaker diarization, sentiment analysis, topic detection, entity extraction, summarization
Post-processing Smart formatting, PII redaction, profanity filtering, noise reduction, utterance/paragraph segmentation
Platform Webhook delivery, transcript search & tagging, storage backends (local/S3), ACL/auth, GDPR compliance, Prometheus metrics
Deployment Docker, Helm charts, Terraform (AWS), serverless (Lambda), Modal (GPU), GPU auto-detection, model caching, SSE streaming

Quick Start

Option 1: Install from PyPI (Recommended)

# Create a virtualenv (Python 3.12 required)
python3.12 -m venv .venv && source .venv/bin/activate

# Install aavaaz with WhisperLive + ML stack
pip install "aavaaz[whisper]"

# Start the server
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

Option 2: Using uv (Fast & Reproducible)

# Install uv: https://docs.astral.sh/uv/
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create venv with Python 3.12 (required; 3.14 not yet supported by ML deps)
uv venv .venv --python python3.12
source .venv/bin/activate

# Install from PyPI
uv pip install "aavaaz[whisper]"

# Start the server
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

Fedora 43+ / Python 3.14 note: The ML stack (PyTorch, faster-whisper) does not yet publish wheels for Python 3.14. Use python3.12 explicitly when creating the virtualenv. On Fedora: sudo dnf install python3.12

Option 3: Local Development Install

git clone git@github.com:collabora/aavaaz.git
cd aavaaz
python3.12 -m venv .venv && source .venv/bin/activate

# Local editable install
pip install -e .

# With WhisperLive + dev tooling
pip install -e ".[whisper,dev]"

Option 4: Using pip with Requirements Files

# Create a virtualenv (Python 3.12 required)
python3.12 -m venv .venv && source .venv/bin/activate

# Install base + ML stack (large ~20GB download for torch/onnx)
pip install -r requirements/whisper.txt

# Or install just base (fast, no ML):
# pip install -r requirements/base.txt

# Start the server (requires ML stack)
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

# OpenAI-compatible REST endpoint
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav -F model=large-v3

Note on Storage

The full ML stack (torch, onnxruntime, torchaudio) requires ~20GB disk space. If you hit disk quota errors, consider:

  • Using uv which is faster and handles large downloads better
  • Installing on a machine with more space
  • Using serverless deployments (AWS Lambda / Modal) instead of local

Requirements Files

  • requirements/base.txt — Core dependencies only (fastapi, uvicorn, boto3)
  • requirements/whisper.txt — Full ML stack (torch, whisper-live>=0.9.0, etc)
  • requirements/dev.txt — Development tools (pytest, ruff, etc)

Architecture

Aavaaz uses WhisperLive as its transcription engine and extends it via the plugin system:

┌─────────────────────────────────────────┐
│              Aavaaz Server               │
│  ┌─────────────────────────────────┐    │
│  │  REST API / WebSocket / Web UI  │    │
│  └──────────────┬──────────────────┘    │
│  ┌──────────────┴──────────────────┐    │
│  │        Plugin Pipeline          │    │
│  │  diarization → formatting →     │    │
│  │  PII redaction → intelligence   │    │
│  └──────────────┬──────────────────┘    │
│  ┌──────────────┴──────────────────┐    │
│  │    WhisperLive Core Engine      │    │
│  │  faster-whisper / TensorRT /    │    │
│  │  OpenVINO                       │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Advanced Features

Word-Level Timestamps

Enable per-word timing and confidence scores in transcription segments:

from aavaaz import AavaazServer

server = AavaazServer()
server.serve(word_timestamps=True)

When enabled, each segment includes a words array:

{
  "segments": [{
    "start": "0.000", "end": "2.500", "text": "Hello world",
    "words": [
      {"word": "Hello", "start": "0.000", "end": "0.800", "probability": 0.95},
      {"word": " world", "start": "0.900", "end": "2.500", "probability": 0.88}
    ]
  }]
}

Custom Vocabulary / Hotwords

Boost recognition of specific terms (product names, acronyms, domain jargon):

from aavaaz import AavaazServer

server = AavaazServer()
server.serve(hotwords="Aavaaz,TensorRT,OpenVINO")

The hotwords parameter is a comma-separated string passed directly to faster-whisper's keyword boosting. Also available in the REST API via the hotwords form field.

Speaker Diarization

Real-time speaker identification using pyannote.audio embeddings:

pip install pyannote.audio
from aavaaz import AavaazServer

server = AavaazServer()
server.serve(enable_diarization=True, max_speakers=4)

When enabled, completed segments include a speaker field:

{"start": "0.000", "end": "2.500", "text": "Hello", "speaker": "SPEAKER_00", "completed": true}

Authentication

Protect both REST API and WebSocket connections with a shared API key:

aavaaz serve --model large-v3 --api-key "my-secret-key"
  • REST API: Requires Authorization: Bearer my-secret-key header
  • WebSocket: Requires either Authorization: Bearer my-secret-key header or ?token=my-secret-key query parameter

Unauthenticated connections receive HTTP 401 before any GPU resources are allocated.

Rate Limiting

Limit REST API requests per client IP (sliding 60-second window):

aavaaz serve --model large-v3 --rate-limit-rpm 60

Clients exceeding the limit receive HTTP 429.

Auto-Reconnect

Automatically reconnect when the WebSocket connection drops unexpectedly:

from whisper_live.client import TranscriptionClient

client = TranscriptionClient(
  "localhost", 9090,
  max_retries=5,
  retry_delay=3,
)

Batch Inference

Batch multiple client sessions into single GPU calls for higher throughput:

aavaaz serve --model large-v3 --batch-inference --batch-max-size 8 --batch-window-ms 50

Prometheus Metrics

Monitor server health with a Prometheus /metrics endpoint:

aavaaz serve --model large-v3 --metrics-port 9091

Tracks active connections, transcription latency, segment counts, and error rates.

SSE Streaming

Stream transcription results via Server-Sent Events from the REST API:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav -F stream=true

Returns real-time segment events as text/event-stream.

Plugin System

Extend the transcription pipeline with custom post-processors:

from aavaaz.plugins import PluginRegistry

registry = PluginRegistry()
registry.register("my_plugin", my_post_processor_fn, priority=50)

server = AavaazServer(plugin_registry=registry)
server.serve()

Plugins receive each transcription segment and can modify, enrich, or filter it before delivery to the client.

Scaling Guide

Single GPU

aavaaz serve --model large-v3 --batch-inference

Multi-GPU (Docker Compose)

services:
  aavaaz:
    image: collabora/aavaaz:latest
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    ports:
      - "9090:9090"
      - "8000:8000"

Kubernetes (Helm)

helm install aavaaz deploy/helm/aavaaz \
  --set model=large-v3 \
  --set replicas=3 \
  --set gpu.enabled=true

AWS (Terraform)

cd deploy/terraform
terraform init
terraform apply -var="model=large-v3" -var="api_key=my-secret"

Provisions VPC, ALB, ECS with GPU instances (g5.xlarge), ECR, and CloudWatch. See deploy/terraform/README.md for full options.

AWS Lambda (Serverless)

For batch file transcription without managing servers:

Production role: Lambda is the batch transcription path.

# Build and push the Lambda container image
docker build -f Dockerfile.lambda --build-arg WHISPER_MODEL=small -t aavaaz-lambda .

# Deploy infrastructure
cd deploy/terraform-lambda
terraform init
terraform apply

# Upload audio — transcript appears automatically in the output bucket
aws s3 cp recording.wav s3://$(terraform output -raw audio_input_bucket)/

# Or use the REST API
curl -X POST $(terraform output -raw api_endpoint) \
  -H "Content-Type: application/json" \
  -d '{"audio_url": "s3://my-bucket/recording.wav"}'

See docs/SERVERLESS.md for full configuration, model selection, cost estimates, and limitations.

Modal (GPU Serverless)

Deploy on Modal for on-demand GPU transcription with zero infrastructure:

Production role: Modal is used for live WebSocket transcription (deploy/modal/app_live.py). The deploy/modal/app.py endpoint is an optional GPU batch API.

cd deploy/modal
pip install modal
modal setup
modal deploy app_live.py

# Optional: deploy the GPU batch API endpoint
# modal deploy app.py

# Transcribe
# Live websocket URL is exposed by app_live.py deployment output.

Auto-scales to zero when idle, GPU containers spin up in seconds. See docs/MODAL.md for full configuration.

Development

git clone git@github.com:collabora/aavaaz.git
cd aavaaz
pip install -e ".[dev]"
pytest

License

MPL-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aavaaz-0.9.0.tar.gz (105.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aavaaz-0.9.0-py3-none-any.whl (82.8 kB view details)

Uploaded Python 3

File details

Details for the file aavaaz-0.9.0.tar.gz.

File metadata

  • Download URL: aavaaz-0.9.0.tar.gz
  • Upload date:
  • Size: 105.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aavaaz-0.9.0.tar.gz
Algorithm Hash digest
SHA256 428a7b5620b5e26828432122607a08e26a460e61b902fc780bbe8001a05bafc8
MD5 fff252313d83e126f4929822bf442570
BLAKE2b-256 22a0bf92414c0ffe470201d5583b5bd41de2f7aaa1ac5f823a31836a66478617

See more details on using hashes here.

File details

Details for the file aavaaz-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: aavaaz-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 82.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aavaaz-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8502d74d0614833473deea0f0a303a682b4425803942902f8e28dd40f82401fe
MD5 d71a8d4efdf7eb1b0a0c8e365a7a636a
BLAKE2b-256 5fe8b93dfff50565ad459a990dd1eb49fe319accae71877b1cfd283ee552c12a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page