Skip to main content

Python client for Gnani's Vachana speech AI platform (STT, TTS, and more)

Project description

gnani-vachana

License: MIT

Official Python client for the Vachana Speech-to-Text API by Gnani.ai. Transcribe audio in 10+ Indian languages via REST or real-time WebSocket streaming.

Vachana is a production-ready speech-to-text API with automatic language detection and code-switching support for accurate multilingual transcriptions.

Installation

pip install gnani-vachana

Requires Python 3.9+.

Quick Start

REST (file-based transcription)

from gnani.stt import GnaniSTTClient

client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

result = client.transcribe("audio.wav", language_code="hi-IN")
print(result["transcript"])

Realtime Streaming (WebSocket)

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    async with GnaniSTTStreamClient(api_key="your-api-key", language_code="hi-IN") as stream:
        # Send audio chunks (raw PCM, 16-bit LE, 16 kHz, mono)
        with open("audio.pcm", "rb") as f:
            while chunk := f.read(1024):
                await stream.send_audio(chunk)
                await asyncio.sleep(0.032)  # real-time pacing (32 ms per frame)

        # Iterate over events
        async for event in stream:
            if isinstance(event, StreamTranscriptEvent):
                print(event.text)

asyncio.run(main())

Authentication

REST API

The REST API uses header-based authentication. Every request requires three credentials:

Parameter Header Description
organization_id X-Organization-ID Your organisation identifier
api_key X-API-Key-ID Secret key for authentication
user_id X-API-User-ID Your user / organisation name

Realtime Streaming API

The WebSocket streaming API requires a single API key:

Parameter Header Description
api_key x-api-key-id API key identifier for authentication

Obtaining Credentials

Email speechstack@gnani.ai with your name, company, and use case. Credentials are typically provisioned within 1 business day, and all new accounts receive free credits -- no credit card required.

Passing Credentials

Option 1 -- Constructor arguments:

# REST client
client = GnaniSTTClient(
    organization_id="your-organization-id",
    api_key="your-api-key",
    user_id="your-user-id",
)

# Streaming client
stream = GnaniSTTStreamClient(api_key="your-api-key")

Option 2 -- Environment variables:

# REST client credentials
export GNANI_ORGANIZATION_ID="your-organization-id"
export GNANI_API_KEY="your-api-key"
export GNANI_USER_ID="your-user-id"
client = GnaniSTTClient()           # picks up all three env vars
stream = GnaniSTTStreamClient()     # picks up GNANI_API_KEY

Supported Languages

REST API

Language Code Native Script
Bengali bn-IN বাংলা
English (India) en-IN Latin
Gujarati gu-IN ગુજરાતી
Hindi hi-IN हिन्दी
Kannada kn-IN ಕನ್ನಡ
Malayalam ml-IN മലയാളം
Marathi mr-IN मराठी
Punjabi pa-IN ਪੰਜਾਬੀ
Tamil ta-IN தமிழ்
Telugu te-IN తెలుగు

For multilingual / code-switching audio (e.g. Hindi-English mix), pass a comma-separated code:

result = client.transcribe("meeting.wav", language_code="en-IN,hi-IN")

Realtime Streaming API

All languages above plus experimental codes:

Language Code Script
Hinglish (Latin) en-hi-IN-latn Latin (experimental)
Hinglish (Code-mixed) en-hi-in-cm Latin + Devanagari (experimental)
Auto-detect AUTO_DETECT All supported (experimental)
from gnani.stt import GnaniSTTStreamClient

# Hinglish (Latin script)
stream = GnaniSTTStreamClient(api_key="key", language_code="en-hi-IN-latn")

# Auto-detect language
stream = GnaniSTTStreamClient(api_key="key", language_code=GnaniSTTStreamClient.AUTO_DETECT)

REST Usage

Transcribe a file by path

result = client.transcribe("meeting.wav", language_code="en-IN")
print(result["transcript"])

Transcribe from a file object

with open("meeting.mp3", "rb") as f:
    result = client.transcribe(f, language_code="ta-IN")

Transcribe raw bytes

audio_bytes = download_audio_from_somewhere()
result = client.transcribe_bytes(
    audio_bytes, filename="clip.wav", language_code="kn-IN"
)

Custom request ID

result = client.transcribe(
    "call.flac", language_code="hi-IN", request_id="my-trace-123"
)

List supported languages

for code, name in GnaniSTTClient.supported_languages().items():
    print(f"{code}: {name}")

Realtime Streaming Usage

Connection Flow

  1. Client opens a WebSocket connection to wss://api.vachana.ai/stt/v3/stream with auth headers.
  2. Server sends a connected event with the active configuration.
  3. Client sends binary PCM audio frames (1024 bytes each = 512 samples at 16-bit).
  4. Server detects speech via VAD and responds with processing and transcript events.
  5. Either side may close the connection at any time.

Audio Format

Property 16 kHz 8 kHz
Encoding PCM signed 16-bit LE PCM signed 16-bit LE
Sample Rate 16,000 Hz 8,000 Hz
Channels 1 (mono) 1 (mono)
Chunk Size 512 samples (32 ms) 512 samples (64 ms)
Frame Bytes 1,024 1,024

Using the async context manager

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    async with GnaniSTTStreamClient(
        api_key="your-api-key",
        language_code="hi-IN",
        sample_rate=16000,
    ) as stream:
        print(f"Connected! Sample rate: {stream.connected_config.sample_rate}")

        with open("audio.pcm", "rb") as f:
            while chunk := f.read(1024):
                await stream.send_audio(chunk)
                await asyncio.sleep(0.032)

        async for event in stream:
            if isinstance(event, StreamTranscriptEvent):
                print(f"[{event.segment_index}] {event.text}")
                print(f"  Duration: {event.audio_duration_ms}ms, Latency: {event.latency}ms")

asyncio.run(main())

Manual connect / close

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent

async def main():
    stream = GnaniSTTStreamClient(api_key="your-api-key")
    config = await stream.connect()
    print(f"Server ready: {config.message}")

    await stream.send_audio(audio_chunk)
    transcripts = await stream.close()

    for t in transcripts:
        print(t.text)

asyncio.run(main())

High-level stream_audio helper with callbacks

import asyncio
from gnani.stt import GnaniSTTStreamClient, StreamTranscriptEvent, StreamProcessingEvent

async def main():
    async with GnaniSTTStreamClient(api_key="your-api-key") as stream:
        with open("audio.pcm", "rb") as f:
            transcripts = await stream.stream_audio(
                f,
                on_transcript=lambda t: print(f"Transcript: {t.text}"),
                on_processing=lambda p: print(f"Processing at {p.timestamp}..."),
                realtime_pace=True,
            )

    print(f"Total segments: {len(transcripts)}")

asyncio.run(main())

Using 8 kHz audio

stream = GnaniSTTStreamClient(
    api_key="your-api-key",
    language_code="en-IN",
    sample_rate=8000,
)

Event Types

All events are typed dataclasses:

Event Fields Description
StreamConnectedEvent message, timestamp, sample_rate, chunk_size, raw Handshake confirmation with server config
StreamProcessingEvent timestamp, raw VAD detected end-of-speech, transcribing
StreamTranscriptEvent text, audio_duration_ms, segment_id, segment_index, latency, timestamp, raw Completed transcription for a speech segment
StreamErrorEvent message, timestamp, raw Server-side error

Accessing the raw JSON payload

Every event includes a raw field with the full server JSON:

async for event in stream:
    print(event.raw)  # dict with the complete server response

Audio Requirements

REST API

Constraint Value
Formats WAV, MP3, FLAC, OGG, M4A
Max duration 60 seconds
Channels Mono or stereo
Sample rate Automatically converted to 16 kHz mono

Realtime Streaming

Constraint Value
Encoding Raw PCM, signed 16-bit little-endian
Sample rate 16,000 Hz or 8,000 Hz
Channels 1 (mono)
Frame size 1,024 bytes (512 samples)
Pacing Send frames at real-time cadence for best VAD accuracy

Response Format

REST

{
  "success": true,
  "request_id": "req_abc123",
  "timestamp": "20251226_143052.123",
  "transcript": "नमस्ते, आप कैसे हैं?"
}

Realtime Streaming

Connected:

{
  "type": "connected",
  "message": "STT service ready — VAD service connected",
  "timestamp": "2024-01-15T10:30:00.000Z",
  "config": { "sample_rate": 16000, "chunk_size": 512 }
}

Transcript:

{
  "type": "transcript",
  "timestamp": "2024-01-15T10:30:05.987Z",
  "text": "Hello, how are you today?",
  "audio_duration_ms": 2340,
  "segment_id": "<segment_id>",
  "segment_index": "<segment_index>",
  "latency": 320
}

Error Handling

from gnani.stt import (
    AuthenticationError,
    InvalidAudioError,
    APIError,
    StreamConnectionError,
    StreamClosedError,
    StreamError,
)

# REST errors
try:
    result = client.transcribe("audio.wav", language_code="hi-IN")
except AuthenticationError:
    print("Check your credentials")
except InvalidAudioError as e:
    print(f"Bad audio file: {e}")
except APIError as e:
    print(f"API error {e.status_code}: {e}")

# Streaming errors
try:
    async with GnaniSTTStreamClient(api_key="key") as stream:
        await stream.send_audio(chunk)
except StreamConnectionError as e:
    print(f"Connection failed: {e}")
except StreamClosedError as e:
    print(f"Stream already closed: {e}")
except StreamError as e:
    print(f"Server error: {e} (at {e.timestamp})")

Documentation

Full API reference and guides are available at docs.inya.ai/vachana.

License

This project is licensed under the MIT License -- see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnani_vachana-0.2.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gnani_vachana-0.2.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file gnani_vachana-0.2.0.tar.gz.

File metadata

  • Download URL: gnani_vachana-0.2.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gnani_vachana-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3bdfb5147bc6b65ca03c517c8854b3df124256a5d53956fc23b19a868416c83f
MD5 03db3e7e78023fc7a2c763c7821fef33
BLAKE2b-256 cfbb97e4dc5e9668503232158ba44c625cd95a1edcbe7d5f5e58c3d7fbf22631

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnani_vachana-0.2.0.tar.gz:

Publisher: workflow.yml on Gnani-AI-Mintlify/Gnani-Vachana

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gnani_vachana-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: gnani_vachana-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gnani_vachana-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe7f918277d68c250b86d6a558631e85a299690f122e4915c12e331ce9bcd434
MD5 31285b83ce9269bcc64c54b22241a67e
BLAKE2b-256 11e584447d39902b7cf339499fc2f259d27e68142ca1f9c21004ac20076fead3

See more details on using hashes here.

Provenance

The following attestation bundles were made for gnani_vachana-0.2.0-py3-none-any.whl:

Publisher: workflow.yml on Gnani-AI-Mintlify/Gnani-Vachana

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page