Aavaaz — production-grade speech-to-text platform built on WhisperLive

These details have not been verified by PyPI

Project links

Project description

Aavaaz

Production-grade speech-to-text platform built on WhisperLive.

Aavaaz (आवाज़, "voice" in Hindi) is an open source extension of WhisperLive with enterprise features that compete with Deepgram, ElevenLabs, and AssemblyAI.

Features

Category	Capabilities
Transcription	Real-time WebSocket streaming, REST API (OpenAI-compatible), batch inference, multichannel audio
Intelligence	Speaker diarization, sentiment analysis, topic detection, entity extraction, summarization
Post-processing	Smart formatting, PII redaction, profanity filtering, noise reduction, utterance/paragraph segmentation
Platform	Webhook delivery, transcript search & tagging, storage backends (local/S3), ACL/auth, GDPR compliance, Prometheus metrics
Deployment	Docker, Helm charts, Terraform (AWS), serverless (Lambda), Modal (GPU), GPU auto-detection, model caching, SSE streaming

Quick Start

Option 1: Install from PyPI (Recommended)

# Create a virtualenv (Python 3.12 required)
python3.12 -m venv .venv && source .venv/bin/activate

# Install aavaaz with WhisperLive + ML stack
pip install "aavaaz[whisper]"

# Start the server
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

Option 2: Using `uv` (Fast & Reproducible)

# Install uv: https://docs.astral.sh/uv/
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create venv with Python 3.12 (required; 3.14 not yet supported by ML deps)
uv venv .venv --python python3.12
source .venv/bin/activate

# Install from PyPI
uv pip install "aavaaz[whisper]"

# Start the server
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

Fedora 43+ / Python 3.14 note: The ML stack (PyTorch, faster-whisper) does not yet publish wheels for Python 3.14. Use python3.12 explicitly when creating the virtualenv. On Fedora: sudo dnf install python3.12

Option 3: Local Development Install

git clone git@github.com:collabora/aavaaz.git
cd aavaaz
python3.12 -m venv .venv && source .venv/bin/activate

# Local editable install
pip install -e .

# With WhisperLive + dev tooling
pip install -e ".[whisper,dev]"

Option 4: Using `pip` with Requirements Files

# Create a virtualenv (Python 3.12 required)
python3.12 -m venv .venv && source .venv/bin/activate

# Install base + ML stack (large ~20GB download for torch/onnx)
pip install -r requirements/whisper.txt

# Or install just base (fast, no ML):
# pip install -r requirements/base.txt

# Start the server (requires ML stack)
aavaaz serve --model large-v3

# Transcribe a file
aavaaz transcribe audio.wav

# OpenAI-compatible REST endpoint
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav -F model=large-v3

Note on Storage

The full ML stack (torch, onnxruntime, torchaudio) requires ~20GB disk space. If you hit disk quota errors, consider:

Using uv which is faster and handles large downloads better
Installing on a machine with more space
Using serverless deployments (AWS Lambda / Modal) instead of local

Requirements Files

requirements/base.txt — Core dependencies only (fastapi, uvicorn, boto3)
requirements/whisper.txt — Full ML stack (torch, whisper-live>=0.9.0, etc)
requirements/dev.txt — Development tools (pytest, ruff, etc)

Architecture

Aavaaz uses WhisperLive as its transcription engine and extends it via the plugin system:

┌─────────────────────────────────────────┐
│              Aavaaz Server               │
│  ┌─────────────────────────────────┐    │
│  │  REST API / WebSocket / Web UI  │    │
│  └──────────────┬──────────────────┘    │
│  ┌──────────────┴──────────────────┐    │
│  │        Plugin Pipeline          │    │
│  │  diarization → formatting →     │    │
│  │  PII redaction → intelligence   │    │
│  └──────────────┬──────────────────┘    │
│  ┌──────────────┴──────────────────┐    │
│  │    WhisperLive Core Engine      │    │
│  │  faster-whisper / TensorRT /    │    │
│  │  OpenVINO                       │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Advanced Features

Word-Level Timestamps

Enable per-word timing and confidence scores in transcription segments:

from aavaaz import AavaazServer

server = AavaazServer()
server.serve(word_timestamps=True)

When enabled, each segment includes a words array:

{
  "segments": [{
    "start": "0.000", "end": "2.500", "text": "Hello world",
    "words": [
      {"word": "Hello", "start": "0.000", "end": "0.800", "probability": 0.95},
      {"word": " world", "start": "0.900", "end": "2.500", "probability": 0.88}
    ]
  }]
}

Custom Vocabulary / Hotwords

Boost recognition of specific terms (product names, acronyms, domain jargon):

from aavaaz import AavaazServer

server = AavaazServer()
server.serve(hotwords="Aavaaz,TensorRT,OpenVINO")

The hotwords parameter is a comma-separated string passed directly to faster-whisper's keyword boosting. Also available in the REST API via the hotwords form field.

Speaker Diarization

Real-time speaker identification using pyannote.audio embeddings:

pip install pyannote.audio

from aavaaz import AavaazServer

server = AavaazServer()
server.serve(enable_diarization=True, max_speakers=4)

When enabled, completed segments include a speaker field:

{"start": "0.000", "end": "2.500", "text": "Hello", "speaker": "SPEAKER_00", "completed": true}

Authentication

Protect both REST API and WebSocket connections with a shared API key:

aavaaz serve --model large-v3 --api-key "my-secret-key"

REST API: Requires Authorization: Bearer my-secret-key header
WebSocket: Requires either Authorization: Bearer my-secret-key header or ?token=my-secret-key query parameter

Unauthenticated connections receive HTTP 401 before any GPU resources are allocated.

Rate Limiting

Limit REST API requests per client IP (sliding 60-second window):

aavaaz serve --model large-v3 --rate-limit-rpm 60

Clients exceeding the limit receive HTTP 429.

Auto-Reconnect

Automatically reconnect when the WebSocket connection drops unexpectedly:

from whisper_live.client import TranscriptionClient

client = TranscriptionClient(
  "localhost", 9090,
  max_retries=5,
  retry_delay=3,
)

Batch Inference

Batch multiple client sessions into single GPU calls for higher throughput:

aavaaz serve --model large-v3 --batch-inference --batch-max-size 8 --batch-window-ms 50

Prometheus Metrics

Monitor server health with a Prometheus /metrics endpoint:

aavaaz serve --model large-v3 --metrics-port 9091

Tracks active connections, transcription latency, segment counts, and error rates.

SSE Streaming

Stream transcription results via Server-Sent Events from the REST API:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav -F stream=true

Returns real-time segment events as text/event-stream.

Plugin System

Extend the transcription pipeline with custom post-processors:

from aavaaz.plugins import PluginRegistry

registry = PluginRegistry()
registry.register("my_plugin", my_post_processor_fn, priority=50)

server = AavaazServer(plugin_registry=registry)
server.serve()

Plugins receive each transcription segment and can modify, enrich, or filter it before delivery to the client.

Scaling Guide

Single GPU

aavaaz serve --model large-v3 --batch-inference

Multi-GPU (Docker Compose)

services:
  aavaaz:
    image: collabora/aavaaz:latest
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    ports:
      - "9090:9090"
      - "8000:8000"

Kubernetes (Helm)

helm install aavaaz deploy/helm/aavaaz \
  --set model=large-v3 \
  --set replicas=3 \
  --set gpu.enabled=true

AWS (Terraform)

cd deploy/terraform
terraform init
terraform apply -var="model=large-v3" -var="api_key=my-secret"

Provisions VPC, ALB, ECS with GPU instances (g5.xlarge), ECR, and CloudWatch. See deploy/terraform/README.md for full options.

AWS Lambda (Serverless)

For batch file transcription without managing servers:

Production role: Lambda is the batch transcription path.

# Build and push the Lambda container image
docker build -f Dockerfile.lambda --build-arg WHISPER_MODEL=small -t aavaaz-lambda .

# Deploy infrastructure
cd deploy/terraform-lambda
terraform init
terraform apply

# Upload audio — transcript appears automatically in the output bucket
aws s3 cp recording.wav s3://$(terraform output -raw audio_input_bucket)/

# Or use the REST API
curl -X POST $(terraform output -raw api_endpoint) \
  -H "Content-Type: application/json" \
  -d '{"audio_url": "s3://my-bucket/recording.wav"}'

See docs/SERVERLESS.md for full configuration, model selection, cost estimates, and limitations.

Modal (GPU Serverless)

Deploy on Modal for on-demand GPU transcription with zero infrastructure:

Production role: Modal is used for live WebSocket transcription (deploy/modal/app_live.py). The deploy/modal/app.py endpoint is an optional GPU batch API.

cd deploy/modal
pip install modal
modal setup
modal deploy app_live.py

# Optional: deploy the GPU batch API endpoint
# modal deploy app.py

# Transcribe
# Live websocket URL is exposed by app_live.py deployment output.

Auto-scales to zero when idle, GPU containers spin up in seconds. See docs/MODAL.md for full configuration.

Development

git clone git@github.com:collabora/aavaaz.git
cd aavaaz
pip install -e ".[dev]"
pytest

License

MPL-2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aavaaz-0.9.0.tar.gz (105.0 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aavaaz-0.9.0-py3-none-any.whl (82.8 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file aavaaz-0.9.0.tar.gz.

File metadata

Download URL: aavaaz-0.9.0.tar.gz
Upload date: Jun 3, 2026
Size: 105.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aavaaz-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`428a7b5620b5e26828432122607a08e26a460e61b902fc780bbe8001a05bafc8`
MD5	`fff252313d83e126f4929822bf442570`
BLAKE2b-256	`22a0bf92414c0ffe470201d5583b5bd41de2f7aaa1ac5f823a31836a66478617`

See more details on using hashes here.

File details

Details for the file aavaaz-0.9.0-py3-none-any.whl.

File metadata

Download URL: aavaaz-0.9.0-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 82.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for aavaaz-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8502d74d0614833473deea0f0a303a682b4425803942902f8e28dd40f82401fe`
MD5	`d71a8d4efdf7eb1b0a0c8e365a7a636a`
BLAKE2b-256	`5fe8b93dfff50565ad459a990dd1eb49fe319accae71877b1cfd283ee552c12a`

See more details on using hashes here.

aavaaz 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Aavaaz

Features

Quick Start

Option 1: Install from PyPI (Recommended)

Option 2: Using uv (Fast & Reproducible)

Option 3: Local Development Install

Option 4: Using pip with Requirements Files

Note on Storage

Requirements Files

Architecture

Advanced Features

Word-Level Timestamps

Custom Vocabulary / Hotwords

Speaker Diarization

Authentication

Rate Limiting

Auto-Reconnect

Batch Inference

Prometheus Metrics

SSE Streaming

Plugin System

Scaling Guide

Single GPU

Multi-GPU (Docker Compose)

Kubernetes (Helm)

AWS (Terraform)

AWS Lambda (Serverless)

Modal (GPU Serverless)

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Option 2: Using `uv` (Fast & Reproducible)

Option 4: Using `pip` with Requirements Files