Skip to main content

Vani — Open-source Voice↔Text↔Action middleware protocol for Indian languages

Project description

Vani

The WebRTC for Indian AI Agents.

License Python Protocol Status

Vani is an open-source protocol and middleware library that handles the messy Voice↔Text↔Action loop for Indian languages. Think of it as the missing link between your LLM and an Indian user calling on a 2G connection — speaking Hinglish.


Why Vani?

India has 1.4 billion people, only ~12% read or write English fluently, and 22 official languages. Yet every AI voice agent built for India reinvents the same wheel:

  • Phoneme mapping for Hindi retroflex consonants (, , , )
  • Code-switching detection ("मुझे एक laptop चाहिए")
  • 2G-safe audio codec negotiation (AMR-NB vs. Opus vs. PCM)
  • Bhashini / Sarvam / AI4Bharat backend failover
  • MeitY DPDP Act data-residency compliance

Vani solves all of this in one open protocol.


Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Calling App                          │
│  (your IVR / chatbot / voice UI / telephony bridge)        │
└───────────────────────────┬─────────────────────────────────┘
                            │  VAM/1.0 gRPC stream
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                     Vani Gateway                            │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │   ASR    │→ │ LLM/NLU  │→ │   TTS    │  │  Action   │  │
│  │  (STT)   │  │          │  │          │  │   (MCP)   │  │
│  └──────────┘  └──────────┘  └──────────┘  └───────────┘  │
│       │              │              │              │        │
│  ┌─────────────────────────────────────────────────────┐   │
│  │         Pluggable Backend Layer                     │   │
│  │  Sarvam AI  │  AI4Bharat  │  Bhashini ULCA          │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘
                            │  MCP tool calls
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              India Tool Registry (MCP servers)              │
│  pan_validate · enam_mandi_price · bhashini_translate       │
│  aadhaar_verify_otp · pm_kisan_eligibility · ...            │
└─────────────────────────────────────────────────────────────┘

Quick Start

Install

# Core + Sarvam AI backend
pip install vani[sarvam]

# Core + AI4Bharat (self-hosted, open-weights)
pip install vani[ai4bharat]

# Core + Bhashini ULCA
pip install vani[bhashini]

# Everything
pip install vani[sarvam,ai4bharat,bhashini,dev]

Hello World — Hinglish Support Agent

import asyncio
from vani import SessionConfig
from vani.backends.sarvam import SarvamSTTBackend, SarvamLLMBackend, SarvamTTSBackend
from vani.gateway.stub import VaniGatewayStub

async def main():
    config = SessionConfig.for_hinglish(caller_id="+91-9876543210")

    gateway = VaniGatewayStub(
        config=config,
        stt=SarvamSTTBackend(api_key="sk-..."),
        llm=SarvamLLMBackend(api_key="sk-..."),
        tts=SarvamTTSBackend(api_key="sk-..."),
        system_prompt="आप एक helpful customer support agent हैं।",
    )

    async for event in gateway.process_audio(your_audio_stream()):
        if event.transcript and event.transcript.is_final:
            print("USER:", event.transcript.text)
        if event.synthesis_chunk and event.synthesis_chunk.is_final:
            play_audio(event.synthesis_chunk.audio_bytes)

asyncio.run(main())

Try it in the Browser

Want to test the full STT → LLM → TTS pipeline before writing any code? Vani ships with a web demo you can try instantly.

One-click deploy

Deploy to Render

You'll be asked for a SARVAM_API_KEY — get one free at sarvam.ai.

Or run locally

git clone https://github.com/vani-voice/vani
cd vani
pip install -e ".[sarvam]" fastapi uvicorn

export SARVAM_API_KEY=your-key-here
python webapp/server.py
# Open http://localhost:8000

Hold the mic button (or press spacebar), speak in any supported Indian language, and release. You'll see:

  1. 🎙 Live transcription of your speech
  2. 🧠 LLM response generated in the same language
  3. 🔊 TTS playback of the assistant's reply

The web demo uses the same VaniGatewayStub pipeline as a production integration — it's a real end-to-end test of the protocol.

Languages supported: Hindi, Telugu, Tamil, Bengali, Marathi, Kannada, Malayalam, Gujarati, English (India)


Language Support

Language BCP-47 Tier Code-Switch Profile Sarvam AI4Bharat Bhashini
Hindi hi-IN 1 Hinglish (hi-en)
Tamil ta-IN 1 Tanglish (ta-en)
Telugu te-IN 1 Tenglish (te-en)
Bengali bn-IN 1 Banglish (bn-en)
Marathi mr-IN 1 Marathlish (mr-en)
Kannada kn-IN 2 Kanglish (kn-en)
Malayalam ml-IN 2 Manglish (ml-en)
Gujarati gu-IN 2
Punjabi pa-IN 2
Odia or-IN 2
Santali sat-IN 3
Manipuri mni-IN 3

Backend Comparison

Feature Sarvam AI AI4Bharat Bhashini ULCA
STT Streaming ✅ WebSocket ❌ Batch ❌ Batch
TTS bulbul:v2 ❌ Stub ✅ REST
LLM sarvam-m
NMT mayura:v1 IndicTrans2 ✅ ULCA
Self-hostable ✅ HuggingFace Partial
Tier A (2G)
Low-resource langs Limited ✅ 20+
Cost API credits Self-host Free (MeitY)

Transport Tiers

Vani automatically negotiates the right codec for the caller's network:

Tier Network Codec Sample Rate Use Case
A 2G GPRS AMR-NB 8 kHz Rural IVR, feature phones
B 3G Opus 16 kHz Smartphone apps, low-cost 4G
C 4G/WiFi PCM 16 kHz / 16-bit Full quality, edge servers
from vani.session import AudioProfile, SessionConfig

config = SessionConfig.for_rural("hi-IN")          # Forces Tier A
config = SessionConfig.for_language("ta-IN",       # Specify tier
    audio_profile=AudioProfile.tier_b())

Code-Switching

Hindi-English code-switching ("Hinglish") is a first-class feature:

Input audio:  "मुझे ये laptop बहुत पसंद है"
               [Hindi]  [English] [Hindi⟩]

TranscriptEvent:
  text: "मुझे ये laptop बहुत पसंद है"
  code_switch_spans:
    - start_char: 8        # Unicode code-point offsets
      end_char: 14
      language_bcp47: "en"
      confidence: 0.94

Important: Offsets are Unicode code-point positions, not UTF-8 byte offsets. is 1 code point but 3 bytes in UTF-8.


Protocol

Vani is defined by three Protobuf files:

File Purpose
proto/vani/v1/session.proto Session negotiation — codec, languages, backends
proto/vani/v1/stream.proto Bidirectional audio/text streaming
proto/vani/v1/action.proto MCP action execution over the stream

Compile Stubs

pip install grpcio-tools
python -m grpc_tools.protoc \
  -I proto \
  --python_out=vani/generated \
  --grpc_python_out=vani/generated \
  --pyi_out=vani/generated \
  proto/vani/v1/session.proto \
  proto/vani/v1/stream.proto \
  proto/vani/v1/action.proto

Action Layer (MCP)

Vani uses the Model Context Protocol (MCP) for tool calls. The gateway can invoke Indian-government and agritech APIs inline during a conversation:

async def my_action_handler(tool_name: str, args: dict) -> str:
    if tool_name == "enam_mandi_price":
        price = await fetch_enam_price(args["crop"], args["mandi"])
        return json.dumps(price)

gateway = VaniGatewayStub(..., action_callback=my_action_handler)

India Tool Registry

Pre-specified MCP tool schemas for Indian services (see spec/IndiaToolRegistry.md):

Tool Service Input
pan_validate NSDL/UTI pan_number
aadhaar_verify_otp UIDAI aadhaar_number
enam_mandi_price eNAM crop, mandi
pm_kisan_eligibility PM-KISAN mobile_number, state
bhashini_translate IndicTrans2 text, src, tgt
ration_card_lookup NFSA ration_card_number, state

Spec Documents

Document Contents
spec/VAM-Overview.md Protocol overview, four-actor model
spec/VAM-CodeSwitch.md Code-switch annotation standard
spec/VAM-Dialects.md Dialect taxonomy and routing
spec/VAM-Transport.md Bandwidth-adaptive transport
spec/VAM-Actions.md MCP action execution flow
spec/IndiaToolRegistry.md India Tool Registry schemas

Examples

# 🌐 Web demo — test in the browser (no mic code needed)
SARVAM_API_KEY=sk-... python webapp/server.py
# → Open http://localhost:8000

# 🎤 CLI demo — terminal-based mic + Rich UI
SARVAM_API_KEY=sk-... python demo/live_cli.py

# Hinglish customer support agent
SARVAM_API_KEY=sk-... python examples/hinglish_support_agent.py

# Tamil agritech IVR (mandi price lookup)
SARVAM_API_KEY=sk-... python examples/tamil_agritech_ivr.py

Conformance

Implementors of the VAM/1.0 protocol can validate against the YAML test suite:

ls conformance/tests/
# session_negotiation.yaml  (10 tests)
# code_switch.yaml          (10 tests)
# turn_signals.yaml         (12 tests)

See conformance/README.md for the conformance runner spec.


Data Residency & Compliance

Vani defaults to DATA_RESIDENCY_INDIA_ONLY to comply with the Digital Personal Data Protection (DPDP) Act, 2023:

  • Audio data never leaves Indian data centres (Sarvam/AI4Bharat servers in India)
  • PII fields (aadhaar_number, pan_number) are not logged at the gateway layer
  • Bhashini backend uses MeitY-hosted ULCA infrastructure

Override only when explicitly needed:

from vani.session import DataResidency
config.data_residency = DataResidency.ANY   # Not recommended

Development

git clone https://github.com/vani-voice/vani
cd vani
pip install -e ".[dev]"

# Run tests
pytest

# Type-check
mypy vani/

# Lint
ruff check vani/

# Regenerate proto stubs
python -m grpc_tools.protoc -I proto \
  --python_out=vani/generated --grpc_python_out=vani/generated --pyi_out=vani/generated \
  proto/vani/v1/*.proto

Roadmap

  • v0.1.0 — Core protocol + Sarvam/AI4Bharat/Bhashini backends (current)
  • v0.2.0 — gRPC server reference implementation
  • v0.3.0 — LiveKit transport bridge
  • v0.4.0 — OpenAI Realtime API adapter
  • v0.5.0 — WebRTC gateway (browser-native)
  • v1.0.0 — Production-stable protocol

Contributing

Contributions welcome! Please read CONTRIBUTING.md (coming soon) and open an issue before submitting large PRs.

Priority areas:

  • Additional language backends (Punjabi, Odia, Santali)
  • Dialect-specific STT fine-tunes
  • More India Tool Registry entries
  • Conformance test runner CLI
  • gRPC server reference implementation

License

Apache 2.0 — see LICENSE.


Vani — voice middleware, built for Bharat.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vani-0.1.0a1.tar.gz (93.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vani-0.1.0a1-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file vani-0.1.0a1.tar.gz.

File metadata

  • Download URL: vani-0.1.0a1.tar.gz
  • Upload date:
  • Size: 93.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for vani-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 4b24efc84383b812fb2c68bcdb3679eaeb2b5eb5474ee21659af7b66d57e1111
MD5 bfb5ff14332823eccbe86990d2e89629
BLAKE2b-256 b51f83f5ff1b1404a38e77516cae04a1a0db6d0eb2f389597c8994afb8f2b58b

See more details on using hashes here.

File details

Details for the file vani-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: vani-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for vani-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 80f1b79478aaa47c0244a6a0a6e5000fc5f5e056b1bbe3f910cec38eafddf4df
MD5 b03d04efeee9ed26da1c4a650a926047
BLAKE2b-256 6670a15f8651ebff7e110ca2acaccf27f3cf60a47e4f273fdbbc15570c96a8f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page