Vani — Open-source Voice↔Text↔Action middleware protocol for Indian languages
Project description
Vani
The WebRTC for Indian AI Agents.
Vani is an open-source protocol and middleware library that handles the messy Voice↔Text↔Action loop for Indian languages. Think of it as the missing link between your LLM and an Indian user calling on a 2G connection — speaking Hinglish.
Why Vani?
India has 1.4 billion people, only ~12% read or write English fluently, and 22 official languages. Yet every AI voice agent built for India reinvents the same wheel:
- Phoneme mapping for Hindi retroflex consonants (
ट,ठ,ड,ढ) - Code-switching detection ("मुझे एक laptop चाहिए")
- 2G-safe audio codec negotiation (AMR-NB vs. Opus vs. PCM)
- Bhashini / Sarvam / AI4Bharat backend failover
- MeitY DPDP Act data-residency compliance
Vani solves all of this in one open protocol.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Calling App │
│ (your IVR / chatbot / voice UI / telephony bridge) │
└───────────────────────────┬─────────────────────────────────┘
│ VAM/1.0 gRPC stream
▼
┌─────────────────────────────────────────────────────────────┐
│ Vani Gateway │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ ASR │→ │ LLM/NLU │→ │ TTS │ │ Action │ │
│ │ (STT) │ │ │ │ │ │ (MCP) │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────────┘ │
│ │ │ │ │ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Pluggable Backend Layer │ │
│ │ Sarvam AI │ AI4Bharat │ Bhashini ULCA │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ MCP tool calls
▼
┌─────────────────────────────────────────────────────────────┐
│ India Tool Registry (MCP servers) │
│ pan_validate · enam_mandi_price · bhashini_translate │
│ aadhaar_verify_otp · pm_kisan_eligibility · ... │
└─────────────────────────────────────────────────────────────┘
Quick Start
Install
# Core + Sarvam AI backend
pip install vani[sarvam]
# Core + AI4Bharat (self-hosted, open-weights)
pip install vani[ai4bharat]
# Core + Bhashini ULCA
pip install vani[bhashini]
# Everything
pip install vani[sarvam,ai4bharat,bhashini,dev]
Hello World — Hinglish Support Agent
import asyncio
from vani import SessionConfig
from vani.backends.sarvam import SarvamSTTBackend, SarvamLLMBackend, SarvamTTSBackend
from vani.gateway.stub import VaniGatewayStub
async def main():
config = SessionConfig.for_hinglish(caller_id="+91-9876543210")
gateway = VaniGatewayStub(
config=config,
stt=SarvamSTTBackend(api_key="sk-..."),
llm=SarvamLLMBackend(api_key="sk-..."),
tts=SarvamTTSBackend(api_key="sk-..."),
system_prompt="आप एक helpful customer support agent हैं।",
)
async for event in gateway.process_audio(your_audio_stream()):
if event.transcript and event.transcript.is_final:
print("USER:", event.transcript.text)
if event.synthesis_chunk and event.synthesis_chunk.is_final:
play_audio(event.synthesis_chunk.audio_bytes)
asyncio.run(main())
Try it in the Browser
Want to test the full STT → LLM → TTS pipeline before writing any code? Vani ships with a web demo you can try instantly.
One-click deploy
You'll be asked for a
SARVAM_API_KEY— get one free at sarvam.ai.
Or run locally
git clone https://github.com/vani-voice/vani
cd vani
pip install -e ".[sarvam]" fastapi uvicorn
export SARVAM_API_KEY=your-key-here
python webapp/server.py
# Open http://localhost:8000
Hold the mic button (or press spacebar), speak in any supported Indian language, and release. You'll see:
- 🎙 Live transcription of your speech
- 🧠 LLM response generated in the same language
- 🔊 TTS playback of the assistant's reply
The web demo uses the same VaniGatewayStub pipeline as a production integration — it's a real end-to-end test of the protocol.
Languages supported: Hindi, Telugu, Tamil, Bengali, Marathi, Kannada, Malayalam, Gujarati, English (India)
Language Support
| Language | BCP-47 | Tier | Code-Switch Profile | Sarvam | AI4Bharat | Bhashini |
|---|---|---|---|---|---|---|
| Hindi | hi-IN |
1 | Hinglish (hi-en) | ✅ | ✅ | ✅ |
| Tamil | ta-IN |
1 | Tanglish (ta-en) | ✅ | ✅ | ✅ |
| Telugu | te-IN |
1 | Tenglish (te-en) | ✅ | ✅ | ✅ |
| Bengali | bn-IN |
1 | Banglish (bn-en) | ✅ | ✅ | ✅ |
| Marathi | mr-IN |
1 | Marathlish (mr-en) | ✅ | ✅ | ✅ |
| Kannada | kn-IN |
2 | Kanglish (kn-en) | ✅ | ✅ | ✅ |
| Malayalam | ml-IN |
2 | Manglish (ml-en) | ✅ | ✅ | ✅ |
| Gujarati | gu-IN |
2 | — | ✅ | ✅ | ✅ |
| Punjabi | pa-IN |
2 | — | ❌ | ✅ | ✅ |
| Odia | or-IN |
2 | — | ❌ | ✅ | ✅ |
| Santali | sat-IN |
3 | — | ❌ | ❌ | ✅ |
| Manipuri | mni-IN |
3 | — | ❌ | ❌ | ✅ |
Backend Comparison
| Feature | Sarvam AI | AI4Bharat | Bhashini ULCA |
|---|---|---|---|
| STT Streaming | ✅ WebSocket | ❌ Batch | ❌ Batch |
| TTS | ✅ bulbul:v2 |
❌ Stub | ✅ REST |
| LLM | ✅ sarvam-m |
❌ | ❌ |
| NMT | ✅ mayura:v1 |
✅ IndicTrans2 |
✅ ULCA |
| Self-hostable | ❌ | ✅ HuggingFace | Partial |
| Tier A (2G) | ✅ | ✅ | ✅ |
| Low-resource langs | ❌ | Limited | ✅ 20+ |
| Cost | API credits | Self-host | Free (MeitY) |
Transport Tiers
Vani automatically negotiates the right codec for the caller's network:
| Tier | Network | Codec | Sample Rate | Use Case |
|---|---|---|---|---|
| A | 2G GPRS | AMR-NB | 8 kHz | Rural IVR, feature phones |
| B | 3G | Opus | 16 kHz | Smartphone apps, low-cost 4G |
| C | 4G/WiFi | PCM | 16 kHz / 16-bit | Full quality, edge servers |
from vani.session import AudioProfile, SessionConfig
config = SessionConfig.for_rural("hi-IN") # Forces Tier A
config = SessionConfig.for_language("ta-IN", # Specify tier
audio_profile=AudioProfile.tier_b())
Code-Switching
Hindi-English code-switching ("Hinglish") is a first-class feature:
Input audio: "मुझे ये laptop बहुत पसंद है"
[Hindi] [English] [Hindi⟩]
TranscriptEvent:
text: "मुझे ये laptop बहुत पसंद है"
code_switch_spans:
- start_char: 8 # Unicode code-point offsets
end_char: 14
language_bcp47: "en"
confidence: 0.94
Important: Offsets are Unicode code-point positions, not UTF-8 byte offsets.
मis 1 code point but 3 bytes in UTF-8.
Protocol
Vani is defined by three Protobuf files:
| File | Purpose |
|---|---|
proto/vani/v1/session.proto |
Session negotiation — codec, languages, backends |
proto/vani/v1/stream.proto |
Bidirectional audio/text streaming |
proto/vani/v1/action.proto |
MCP action execution over the stream |
Compile Stubs
pip install grpcio-tools
python -m grpc_tools.protoc \
-I proto \
--python_out=vani/generated \
--grpc_python_out=vani/generated \
--pyi_out=vani/generated \
proto/vani/v1/session.proto \
proto/vani/v1/stream.proto \
proto/vani/v1/action.proto
Action Layer (MCP)
Vani uses the Model Context Protocol (MCP) for tool calls. The gateway can invoke Indian-government and agritech APIs inline during a conversation:
async def my_action_handler(tool_name: str, args: dict) -> str:
if tool_name == "enam_mandi_price":
price = await fetch_enam_price(args["crop"], args["mandi"])
return json.dumps(price)
gateway = VaniGatewayStub(..., action_callback=my_action_handler)
India Tool Registry
Pre-specified MCP tool schemas for Indian services (see spec/IndiaToolRegistry.md):
| Tool | Service | Input |
|---|---|---|
pan_validate |
NSDL/UTI | pan_number |
aadhaar_verify_otp |
UIDAI | aadhaar_number |
enam_mandi_price |
eNAM | crop, mandi |
pm_kisan_eligibility |
PM-KISAN | mobile_number, state |
bhashini_translate |
IndicTrans2 | text, src, tgt |
ration_card_lookup |
NFSA | ration_card_number, state |
Spec Documents
| Document | Contents |
|---|---|
spec/VAM-Overview.md |
Protocol overview, four-actor model |
spec/VAM-CodeSwitch.md |
Code-switch annotation standard |
spec/VAM-Dialects.md |
Dialect taxonomy and routing |
spec/VAM-Transport.md |
Bandwidth-adaptive transport |
spec/VAM-Actions.md |
MCP action execution flow |
spec/IndiaToolRegistry.md |
India Tool Registry schemas |
Examples
# 🌐 Web demo — test in the browser (no mic code needed)
SARVAM_API_KEY=sk-... python webapp/server.py
# → Open http://localhost:8000
# 🎤 CLI demo — terminal-based mic + Rich UI
SARVAM_API_KEY=sk-... python demo/live_cli.py
# Hinglish customer support agent
SARVAM_API_KEY=sk-... python examples/hinglish_support_agent.py
# Tamil agritech IVR (mandi price lookup)
SARVAM_API_KEY=sk-... python examples/tamil_agritech_ivr.py
Conformance
Implementors of the VAM/1.0 protocol can validate against the YAML test suite:
ls conformance/tests/
# session_negotiation.yaml (10 tests)
# code_switch.yaml (10 tests)
# turn_signals.yaml (12 tests)
See conformance/README.md for the conformance runner spec.
Data Residency & Compliance
Vani defaults to DATA_RESIDENCY_INDIA_ONLY to comply with the Digital Personal Data Protection (DPDP) Act, 2023:
- Audio data never leaves Indian data centres (Sarvam/AI4Bharat servers in India)
- PII fields (
aadhaar_number,pan_number) are not logged at the gateway layer - Bhashini backend uses MeitY-hosted ULCA infrastructure
Override only when explicitly needed:
from vani.session import DataResidency
config.data_residency = DataResidency.ANY # Not recommended
Development
git clone https://github.com/vani-voice/vani
cd vani
pip install -e ".[dev]"
# Run tests
pytest
# Type-check
mypy vani/
# Lint
ruff check vani/
# Regenerate proto stubs
python -m grpc_tools.protoc -I proto \
--python_out=vani/generated --grpc_python_out=vani/generated --pyi_out=vani/generated \
proto/vani/v1/*.proto
Roadmap
-
v0.1.0— Core protocol + Sarvam/AI4Bharat/Bhashini backends (current) -
v0.2.0— gRPC server reference implementation -
v0.3.0— LiveKit transport bridge -
v0.4.0— OpenAI Realtime API adapter -
v0.5.0— WebRTC gateway (browser-native) -
v1.0.0— Production-stable protocol
Contributing
Contributions welcome! Please read CONTRIBUTING.md (coming soon) and open an issue before submitting large PRs.
Priority areas:
- Additional language backends (Punjabi, Odia, Santali)
- Dialect-specific STT fine-tunes
- More India Tool Registry entries
- Conformance test runner CLI
- gRPC server reference implementation
License
Apache 2.0 — see LICENSE.
Vani — voice middleware, built for Bharat.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vani-0.1.0a1.tar.gz.
File metadata
- Download URL: vani-0.1.0a1.tar.gz
- Upload date:
- Size: 93.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b24efc84383b812fb2c68bcdb3679eaeb2b5eb5474ee21659af7b66d57e1111
|
|
| MD5 |
bfb5ff14332823eccbe86990d2e89629
|
|
| BLAKE2b-256 |
b51f83f5ff1b1404a38e77516cae04a1a0db6d0eb2f389597c8994afb8f2b58b
|
File details
Details for the file vani-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: vani-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 36.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80f1b79478aaa47c0244a6a0a6e5000fc5f5e056b1bbe3f910cec38eafddf4df
|
|
| MD5 |
b03d04efeee9ed26da1c4a650a926047
|
|
| BLAKE2b-256 |
6670a15f8651ebff7e110ca2acaccf27f3cf60a47e4f273fdbbc15570c96a8f8
|