Python SDK for real-time speech transcription via WebSocket

These details have not been verified by PyPI

Project links

Project description

Realtime ASR SDK

A Python SDK designed for real-time speech transcription over WebSocket, offering a simple and reliable interface for integrating real-time ASR (Automatic Speech Recognition) capabilities into your applications.

Features

WebSocket-based real-time audio streaming
Microphone audio capture and processing
Support for multiple audio formats (8kHz - 44.1kHz)
Word-level timestamps
Multiple language support
Event-driven architecture with callbacks
Easy integration with existing Python applications

Installation

Basic Installation

pip install -r requirements.txt

Development Installation

pip install -e .

Installing with Audio Support

For microphone capture functionality:

pip install -r requirements.txt
# or
pip install -e ".[audio]"

Note: PyAudio installation may require additional system dependencies:

macOS:

brew install portaudio
pip install pyaudio

Ubuntu/Debian:

sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio

Windows:

pip install pipwin
pipwin install pyaudio

Quick Start

Simple Example

from realtime_asr import RealtimeASRClient, AudioStream, AudioFormat

# Create client
client = RealtimeASRClient(
    ws_url="ws://localhost:8081/asr/realtime",
    api_key="your-api-key",
    model_id="echo_v1_realtime",
    audio_format=AudioFormat.PCM_16000,
)

# Set up callbacks
client.on_partial_transcript = lambda msg: print(f"[Partial] {msg.text}")
client.on_committed_transcript = lambda msg: print(f"[Final] {msg.text}")
client.on_error = lambda msg: print(f"[Error] {msg.error}")

# Connect
client.connect()

# Stream from microphone
with AudioStream(audio_format=AudioFormat.PCM_16000) as stream:
    stream.start(lambda audio_data: client.send_audio(audio_data))
    input("Press Enter to stop...")

# Disconnect
client.disconnect()

Streaming from Microphone

import logging
from realtime_asr import RealtimeASRClient, AudioStream, AudioFormat, CommitStrategy

logging.basicConfig(level=logging.INFO)

# Create and configure client
client = RealtimeASRClient(
    ws_url="ws://localhost:8081/asr/realtime",
    api_key="your-api-key",
    model_id="echo_v1_realtime",
    language="en",  # or None for auto-detect
    audio_format=AudioFormat.PCM_16000,
    commit_mode=CommitStrategy.VAD,
    word_timestamps=True,
)

# Define event handlers
def on_session_started(msg):
    print(f"Session ID: {msg.session_id}")

def on_partial_transcript(msg):
    if msg.text.strip():
        print(f"[Partial] {msg.text}")

def on_committed_transcript_with_timestamps(msg):
    if msg.text.strip():
        print(f"[Final] {msg.text}")
        if msg.words:
            duration = msg.words[-1].end - msg.words[0].start
            print(f"  → {len(msg.words)} words, {duration:.2f}s")

def on_error(msg):
    print(f"Error: {msg.error}")

# Register callbacks
client.on_session_started = on_session_started
client.on_partial_transcript = on_partial_transcript
client.on_committed_transcript_with_timestamps = on_committed_transcript_with_timestamps
client.on_error = on_error

# Connect and stream
client.connect()

audio_stream = AudioStream(audio_format=AudioFormat.PCM_16000)
audio_stream.start(lambda audio_data: client.send_audio(audio_data))

try:
    input("Press Enter to stop recording...\n")
except KeyboardInterrupt:
    pass

# Cleanup
audio_stream.stop()
client.disconnect()

API Reference

RealtimeASRClient

Main client class for WebSocket communication.

Constructor Parameters

ws_url (str): WebSocket server URL
api_key (str): API key for authentication
model_id (str): Model ID (e.g., "echo_v1_realtime", "lexis_v1")
language (Optional[str]): Language code (e.g., "en", "zh", "ja") or None for auto-detect
audio_format (AudioFormat): Audio format specification
commit_mode (CommitStrategy): Commit strategy (VAD or MANUAL)
word_timestamps (bool): Whether to request word-level timestamps

Methods

connect(timeout: float = 10.0): Connect to WebSocket server
disconnect(): Disconnect from server
send_audio(audio_data: bytes, commit: bool = False): Send audio chunk
is_connected: Property to check connection status
session_id: Property to get current session ID

Event Callbacks

on_session_started(msg: SessionStartedMessage): Called when session starts
on_partial_transcript(msg: PartialTranscriptMessage): Called for partial transcripts
on_committed_transcript(msg: CommittedTranscriptMessage): Called for final transcripts
on_committed_transcript_with_timestamps(msg: CommittedTranscriptWithTimestampsMessage): Called for final transcripts with timestamps
on_error(msg: ErrorMessage): Called on errors
on_message(msg: TranscriptionMessage): Called for any message
on_connected(): Called when connected
on_disconnected(code: int, reason: str): Called when disconnected

AudioStream

Audio stream handler for microphone capture.

Constructor Parameters

audio_format (AudioFormat): Audio format specification
chunk_size (int): Number of frames per buffer (default: 4096)
channels (int): Number of audio channels (default: 1)

Methods

start(callback: Callable[[bytes], None]): Start audio capture
stop(): Stop audio capture
close(): Close and cleanup resources

AudioFormat

Enum for supported audio formats:

PCM_8000: 8kHz PCM
PCM_16000: 16kHz PCM (recommended)
PCM_22050: 22.05kHz PCM
PCM_24000: 24kHz PCM
PCM_44100: 44.1kHz PCM

CommitStrategy

Enum for commit strategies:

VAD: Voice Activity Detection (automatic)
MANUAL: Manual commit

Message Types

SessionStartedMessage

@dataclass
class SessionStartedMessage:
    message_type: str
    session_id: Optional[str]
    raw_data: Dict[str, Any]

PartialTranscriptMessage

@dataclass
class PartialTranscriptMessage:
    message_type: str
    text: str
    raw_data: Dict[str, Any]

CommittedTranscriptMessage

@dataclass
class CommittedTranscriptMessage:
    message_type: str
    text: str
    raw_data: Dict[str, Any]

CommittedTranscriptWithTimestampsMessage

@dataclass
class CommittedTranscriptWithTimestampsMessage:
    message_type: str
    text: str
    words: List[WordTimestamp]
    raw_data: Dict[str, Any]

@dataclass
class WordTimestamp:
    word: str
    start: float
    end: float

ErrorMessage

@dataclass
class ErrorMessage:
    message_type: str
    error: str
    raw_data: Dict[str, Any]

Examples

The examples/ directory contains several example scripts:

simple_example.py: Basic usage demonstration
stream_from_mic.py: Full-featured microphone streaming with interactive options
send_audio_file.py: Send pre-recorded WAV file for transcription

Run examples:

python examples/simple_example.py
python examples/stream_from_mic.py
python examples/send_audio_file.py

Supported Languages

The SDK supports multiple languages including:

English (en)
Chinese (zh)
Japanese (ja)
Korean (ko)
Spanish (es)
French (fr)
German (de)
Russian (ru)
Arabic (ar)
Portuguese (pt)
And more...

Set language=None for automatic language detection.

Error Handling

The SDK provides comprehensive error handling through callbacks:

def on_error(msg):
    print(f"Error type: {msg.message_type}")
    print(f"Error message: {msg.error}")

client.on_error = on_error

Common error types:

error: General error
auth_error: Authentication failed
quota_exceeded_error: API quota exceeded
unaccepted_terms: Terms of service not accepted

Requirements

Python 3.8+
websocket-client >= 1.6.0
numpy >= 1.24.0
PyAudio >= 0.2.13 (for microphone capture)

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Create an issue on GitHub
Check the examples directory for usage patterns
Review the API documentation above

Changelog

Version 0.1.0

Initial release
WebSocket client implementation
Audio streaming support
Event-driven callbacks
Multiple audio format support
Word-level timestamps
Example scripts

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Nov 18, 2025

0.1.1

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

realtime_asr_sdk-0.1.2.tar.gz (14.1 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

realtime_asr_sdk-0.1.2-py3-none-any.whl (12.5 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file realtime_asr_sdk-0.1.2.tar.gz.

File metadata

Download URL: realtime_asr_sdk-0.1.2.tar.gz
Upload date: Nov 18, 2025
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for realtime_asr_sdk-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`08fa124fe7adb4c3f6ed0b94f1c37787e2ed1c2131674776aef6a6e7ab0387bf`
MD5	`ab1c473117f688ca82ed7de6af6ab87b`
BLAKE2b-256	`654fe944168553a7d108a06adf2da78f18d86c425b95855bd8a01fa8bb623060`

See more details on using hashes here.

File details

Details for the file realtime_asr_sdk-0.1.2-py3-none-any.whl.

File metadata

Download URL: realtime_asr_sdk-0.1.2-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for realtime_asr_sdk-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f88e971c715febf68e472424f24e0a1d29844d78b14d52c1e61c97d4c4754bff`
MD5	`1fe4697526faa508da4cc739f6742880`
BLAKE2b-256	`51fa6b1b9074f49a974b24ad80a95d2899e1604efaef882cfe3d34b31a0cb093`

See more details on using hashes here.

realtime-asr-sdk 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Realtime ASR SDK

Features

Installation

Basic Installation

Development Installation

Installing with Audio Support

Quick Start

Simple Example

Streaming from Microphone

API Reference

RealtimeASRClient

Constructor Parameters

Methods

Event Callbacks

AudioStream

Constructor Parameters

Methods

AudioFormat

CommitStrategy

Message Types

SessionStartedMessage

PartialTranscriptMessage

CommittedTranscriptMessage

CommittedTranscriptWithTimestampsMessage

ErrorMessage

Examples

Supported Languages

Error Handling

Requirements

License

Contributing

Support

Changelog

Version 0.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes