Skip to main content

Python client for Gnani's Vachana speech AI platform (STT, TTS, and more)

Project description

gnani-vachana

License: MIT

Official Python client for Vachana Speech APIs by Gnani.ai. Build multilingual voice workflows with Speech-to-Text (STT) and Text-to-Speech (TTS) across REST, SSE streaming, and real-time WebSockets.

Vachana is a production-ready speech platform with high-accuracy STT and low-latency TTS for Indian languages, including multilingual and code-switching scenarios.

Installation

pip install gnani-vachana

Requires Python 3.9+.

Quick Start

STT REST (file-based transcription)

from gnani.stt import GnaniSTTClient

client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

result = client.transcribe("audio.wav", language_code="hi-IN")
print(result["transcript"])

Realtime Streaming (WebSocket)

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    async with GnaniSTTStreamClient(api_key="your-api-key", language_code="hi-IN") as stream:
        # Send audio chunks (raw PCM, 16-bit LE, 16 kHz, mono)
        with open("audio.pcm", "rb") as f:
            while chunk := f.read(1024):
                await stream.send_audio(chunk)
                await asyncio.sleep(0.032)  # real-time pacing (32 ms per frame)

        # Iterate over events
        async for event in stream:
            if isinstance(event, StreamTranscriptEvent):
                print(event.text)

asyncio.run(main())

TTS REST (single response)

from gnani.tts import GnaniTTSClient, AudioConfig

client = GnaniTTSClient(api_key="your-api-key")
audio = client.synthesize(
    "नमस्ते, आप कैसे हैं?",
    voice="sia",
    audio_config=AudioConfig(sample_rate=44100, encoding="linear_pcm", container="wav"),
)

with open("tts_output.wav", "wb") as f:
    f.write(audio)

TTS Realtime (WebSocket)

import asyncio
from gnani.tts import GnaniTTSRealtimeClient

async def main():
    async with GnaniTTSRealtimeClient(api_key="your-api-key") as client:
        with open("tts_realtime.wav", "wb") as f:
            async for chunk in client.synthesize("Hello from Gnani TTS", voice="sia"):
                f.write(chunk)

asyncio.run(main())

Authentication

STT REST API

The REST API uses header-based authentication. Every request requires three credentials:

Parameter Header Description
organization_id X-Organization-ID Your organisation identifier
api_key X-API-Key-ID Secret key for authentication
user_id X-API-User-ID Your user / organisation name

STT Realtime Streaming API

The WebSocket streaming API requires a single API key:

Parameter Header Description
api_key x-api-key-id API key identifier for authentication

TTS API (REST, SSE, Realtime)

All TTS interfaces require a single API key:

Parameter Header Description
api_key X-API-Key-ID API key identifier for authentication

Obtaining Credentials

Email speechstack@gnani.ai with your name, company, and use case. Credentials are typically provisioned within 1 business day, and all new accounts receive free credits -- no credit card required.

Passing Credentials

Option 1 -- Constructor arguments:

from gnani.stt import GnaniSTTClient, GnaniSTTStreamClient
from gnani.tts import GnaniTTSClient, GnaniTTSRealtimeClient, GnaniTTSStreamClient

# REST client
client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

# Streaming client
stream = GnaniSTTStreamClient(api_key="your-api-key")

# TTS clients
tts_rest = GnaniTTSClient(api_key="your-api-key")
tts_stream = GnaniTTSStreamClient(api_key="your-api-key")
tts_realtime = GnaniTTSRealtimeClient(api_key="your-api-key")

Option 2 -- Environment variables:

# REST client credentials
export GNANI_ORGANIZATION_ID="your-organization-id"
export GNANI_API_KEY="your-api-key"
export GNANI_USER_ID="your-user-id"
from gnani.stt import GnaniSTTClient, GnaniSTTStreamClient
from gnani.tts import GnaniTTSClient

client = GnaniSTTClient()           # picks up all three env vars
stream = GnaniSTTStreamClient()     # picks up GNANI_API_KEY
tts = GnaniTTSClient()              # picks up GNANI_API_KEY

Supported Languages

REST API

Language Code Native Script
Bengali bn-IN বাংলা
English (India) en-IN Latin
Gujarati gu-IN ગુજરાતી
Hindi hi-IN हिन्दी
Kannada kn-IN ಕನ್ನಡ
Malayalam ml-IN മലയാളം
Marathi mr-IN मराठी
Punjabi pa-IN ਪੰਜਾਬੀ
Tamil ta-IN தமிழ்
Telugu te-IN తెలుగు

For multilingual / code-switching audio (e.g. Hindi-English mix), pass a comma-separated code:

result = client.transcribe("meeting.wav", language_code="en-IN,hi-IN")

Realtime Streaming API

All languages above plus experimental codes:

Language Code Script
Hinglish (Latin) en-hi-IN-latn Latin (experimental)
Hinglish (Code-mixed) en-hi-in-cm Latin + Devanagari (experimental)
Auto-detect AUTO_DETECT All supported (experimental)
from gnani.stt import GnaniSTTStreamClient

# Hinglish (Latin script)
stream = GnaniSTTStreamClient(api_key="key", language_code="en-hi-IN-latn")

# Auto-detect language
stream = GnaniSTTStreamClient(api_key="key", language_code=GnaniSTTStreamClient.AUTO_DETECT)

REST Usage

Transcribe a file by path

result = client.transcribe("meeting.wav", language_code="en-IN")
print(result["transcript"])

Transcribe from a file object

with open("meeting.mp3", "rb") as f:
    result = client.transcribe(f, language_code="ta-IN")

Transcribe raw bytes

audio_bytes = download_audio_from_somewhere()
result = client.transcribe_bytes(
    audio_bytes, filename="clip.wav", language_code="kn-IN"
)

Custom request ID

result = client.transcribe(
    "call.flac", language_code="hi-IN", request_id="my-trace-123"
)

List supported languages

for code, name in GnaniSTTClient.supported_languages().items():
    print(f"{code}: {name}")

Realtime Streaming Usage

Connection Flow

  1. Client opens a WebSocket connection to wss://api.vachana.ai/stt/v3/stream with auth headers.
  2. Server sends a connected event with the active configuration.
  3. Client sends binary PCM audio frames (1024 bytes each = 512 samples at 16-bit).
  4. Server detects speech via VAD and responds with processing and transcript events.
  5. Either side may close the connection at any time.

Audio Format

Property 16 kHz 8 kHz
Encoding PCM signed 16-bit LE PCM signed 16-bit LE
Sample Rate 16,000 Hz 8,000 Hz
Channels 1 (mono) 1 (mono)
Chunk Size 512 samples (32 ms) 512 samples (64 ms)
Frame Bytes 1,024 1,024

Using the async context manager

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    async with GnaniSTTStreamClient(
        api_key="your-api-key",
        language_code="hi-IN",
        sample_rate=16000,
    ) as stream:
        print(f"Connected! Sample rate: {stream.connected_config.sample_rate}")

        with open("audio.pcm", "rb") as f:
            while chunk := f.read(1024):
                await stream.send_audio(chunk)
                await asyncio.sleep(0.032)

        async for event in stream:
            if isinstance(event, StreamTranscriptEvent):
                print(f"[{event.segment_index}] {event.text}")
                print(f"  Duration: {event.audio_duration_ms}ms, Latency: {event.latency}ms")

asyncio.run(main())

Manual connect / close

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    stream = GnaniSTTStreamClient(api_key="your-api-key")
    config = await stream.connect()
    print(f"Server ready: {config.message}")

    await stream.send_audio(audio_chunk)
    transcripts = await stream.close()

    for t in transcripts:
        print(t.text)

asyncio.run(main())

High-level stream_audio helper with callbacks

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent, StreamProcessingEvent

async def main():
    async with GnaniSTTStreamClient(api_key="your-api-key") as stream:
        with open("audio.pcm", "rb") as f:
            transcripts = await stream.stream_audio(
                f,
                on_transcript=lambda t: print(f"Transcript: {t.text}"),
                on_processing=lambda p: print(f"Processing at {p.timestamp}..."),
                realtime_pace=True,
            )

    print(f"Total segments: {len(transcripts)}")

asyncio.run(main())

Using 8 kHz audio

stream = GnaniSTTStreamClient(
    api_key="your-api-key",
    language_code="en-IN",
    sample_rate=8000,
)

Event Types

All events are typed dataclasses:

Event Fields Description
StreamConnectedEvent message, timestamp, sample_rate, chunk_size, raw Handshake confirmation with server config
StreamProcessingEvent timestamp, raw VAD detected end-of-speech, transcribing
StreamTranscriptEvent text, audio_duration_ms, segment_id, segment_index, latency, timestamp, raw Completed transcription for a speech segment
StreamErrorEvent message, timestamp, raw Server-side error

Accessing the raw JSON payload

Every event includes a raw field with the full server JSON:

async for event in stream:
    print(event.raw)  # dict with the complete server response

Text-to-Speech Usage

TTS REST

from gnani.tts import GnaniTTSClient

client = GnaniTTSClient(api_key="your-api-key")
audio = client.synthesize("यह एक टेस्ट है", voice="sia")
with open("tts_rest.wav", "wb") as f:
    f.write(audio)

TTS Streaming (SSE)

from gnani.tts import GnaniTTSStreamClient

client = GnaniTTSStreamClient(api_key="your-api-key")
with open("tts_sse.wav", "wb") as f:
    for chunk in client.synthesize_stream("Streaming TTS response", voice="raju"):
        f.write(chunk)

TTS Realtime (WebSocket)

import asyncio
from gnani.tts import GnaniTTSRealtimeClient

async def main():
    async with GnaniTTSRealtimeClient(api_key="your-api-key") as client:
        audio = await client.synthesize_and_collect("Realtime TTS response", voice="neha")
        with open("tts_realtime.wav", "wb") as f:
            f.write(audio)

asyncio.run(main())

TTS Voices

from gnani.tts import GnaniTTSClient

print(GnaniTTSClient.supported_voices())

Audio Requirements

REST API

Constraint Value
Formats WAV, MP3, FLAC, OGG, M4A
Max duration 60 seconds
Channels Mono or stereo
Sample rate Automatically converted to 16 kHz mono

Realtime Streaming

Constraint Value
Encoding Raw PCM, signed 16-bit little-endian
Sample rate 16,000 Hz or 8,000 Hz
Channels 1 (mono)
Frame size 1,024 bytes (512 samples)
Pacing Send frames at real-time cadence for best VAD accuracy

Response Format

REST

{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Realtime Streaming

Connected:

{
  "type": "connected",
  "message": "STT service ready — VAD service connected",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "config": { "sample_rate": 16000, "chunk_size": 512 }
}

Transcript:

{
  "type": "transcript",
  "timestamp": "2024-01-15T10:30:05.987Z",
  "text": "Hello, how are you today?",
  "audio_duration_ms": 2340,
  "segment_id": "<segment_id>",
  "segment_index": "<segment_index>",
  "latency": 320
}

Error Handling

from gnani.stt import (
    AuthenticationError,
    InvalidAudioError,
    APIError,
    StreamConnectionError,
    StreamClosedError,
    StreamError,
)

# REST errors
try:
    result = client.transcribe("audio.wav", language_code="hi-IN")
except AuthenticationError:
    print("Check your credentials")
except InvalidAudioError as e:
    print(f"Bad audio file: {e}")
except APIError as e:
    print(f"API error {e.status_code}: {e}")

# Streaming errors
try:
    async with GnaniSTTStreamClient(api_key="key") as stream:
        await stream.send_audio(chunk)
except StreamConnectionError as e:
    print(f"Connection failed: {e}")
except StreamClosedError as e:
    print(f"Stream already closed: {e}")
except StreamError as e:
    print(f"Server error: {e} (at {e.timestamp})")

Documentation

Full API reference and guides are available at docs.inya.ai/vachana.

License

This project is licensed under the MIT License -- see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnani_vachana-0.2.1.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gnani_vachana-0.2.1-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file gnani_vachana-0.2.1.tar.gz.

File metadata

  • Download URL: gnani_vachana-0.2.1.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gnani_vachana-0.2.1.tar.gz
Algorithm Hash digest
SHA256 060210b56d31f72b7bf4aa53b97508c8eed763efd4bdfb28d705894e74f78d80
MD5 86a0d3fd5c33dacfde3cbe6edd12fda5
BLAKE2b-256 e3adcc666c1d6ebc0ca54a2f7a6f19ad92c7ace46614e46c22aca4f1ee00506e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnani_vachana-0.2.1.tar.gz:

Publisher: workflow.yml on Gnani-AI-Mintlify/Gnani-Vachana

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gnani_vachana-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gnani_vachana-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gnani_vachana-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 864126e0353880c7ff5aa04371069b671f1915e43f9b2abd02fb0784180e4509
MD5 bb2aec5f69c01169fc462c5f05e499ce
BLAKE2b-256 9f84845294633beb20374a2acf17c3324775bfb1780e4e653e91e9f7890927b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnani_vachana-0.2.1-py3-none-any.whl:

Publisher: workflow.yml on Gnani-AI-Mintlify/Gnani-Vachana

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page