Python SDK for real-time speech transcription via WebSocket
Project description
Realtime ASR SDK
A Python SDK designed for real-time speech transcription over WebSocket, offering a simple and reliable interface for integrating real-time ASR (Automatic Speech Recognition) capabilities into your applications.
Features
- WebSocket-based real-time audio streaming
- Microphone audio capture and processing
- Support for multiple audio formats (8kHz - 44.1kHz)
- Word-level timestamps
- Multiple language support
- Event-driven architecture with callbacks
- Easy integration with existing Python applications
Installation
Basic Installation
pip install -r requirements.txt
Development Installation
pip install -e .
Installing with Audio Support
For microphone capture functionality:
pip install -r requirements.txt
# or
pip install -e ".[audio]"
Note: PyAudio installation may require additional system dependencies:
macOS:
brew install portaudio
pip install pyaudio
Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-pyaudio
pip install pyaudio
Windows:
pip install pipwin
pipwin install pyaudio
Quick Start
Simple Example
from realtime_asr import RealtimeASRClient, AudioStream, AudioFormat
# Create client
client = RealtimeASRClient(
ws_url="ws://localhost:8081/asr/realtime",
api_key="your-api-key",
model_id="echo_v1_realtime",
audio_format=AudioFormat.PCM_16000,
)
# Set up callbacks
client.on_partial_transcript = lambda msg: print(f"[Partial] {msg.text}")
client.on_committed_transcript = lambda msg: print(f"[Final] {msg.text}")
client.on_error = lambda msg: print(f"[Error] {msg.error}")
# Connect
client.connect()
# Stream from microphone
with AudioStream(audio_format=AudioFormat.PCM_16000) as stream:
stream.start(lambda audio_data: client.send_audio(audio_data))
input("Press Enter to stop...")
# Disconnect
client.disconnect()
Streaming from Microphone
import logging
from realtime_asr import RealtimeASRClient, AudioStream, AudioFormat, CommitStrategy
logging.basicConfig(level=logging.INFO)
# Create and configure client
client = RealtimeASRClient(
ws_url="ws://localhost:8081/asr/realtime",
api_key="your-api-key",
model_id="echo_v1_realtime",
language="en", # or None for auto-detect
audio_format=AudioFormat.PCM_16000,
commit_mode=CommitStrategy.VAD,
word_timestamps=True,
)
# Define event handlers
def on_session_started(msg):
print(f"Session ID: {msg.session_id}")
def on_partial_transcript(msg):
if msg.text.strip():
print(f"[Partial] {msg.text}")
def on_committed_transcript_with_timestamps(msg):
if msg.text.strip():
print(f"[Final] {msg.text}")
if msg.words:
duration = msg.words[-1].end - msg.words[0].start
print(f" → {len(msg.words)} words, {duration:.2f}s")
def on_error(msg):
print(f"Error: {msg.error}")
# Register callbacks
client.on_session_started = on_session_started
client.on_partial_transcript = on_partial_transcript
client.on_committed_transcript_with_timestamps = on_committed_transcript_with_timestamps
client.on_error = on_error
# Connect and stream
client.connect()
audio_stream = AudioStream(audio_format=AudioFormat.PCM_16000)
audio_stream.start(lambda audio_data: client.send_audio(audio_data))
try:
input("Press Enter to stop recording...\n")
except KeyboardInterrupt:
pass
# Cleanup
audio_stream.stop()
client.disconnect()
API Reference
RealtimeASRClient
Main client class for WebSocket communication.
Constructor Parameters
ws_url(str): WebSocket server URLapi_key(str): API key for authenticationmodel_id(str): Model ID (e.g., "echo_v1_realtime", "lexis_v1")language(Optional[str]): Language code (e.g., "en", "zh", "ja") or None for auto-detectaudio_format(AudioFormat): Audio format specificationcommit_mode(CommitStrategy): Commit strategy (VAD or MANUAL)word_timestamps(bool): Whether to request word-level timestamps
Methods
connect(timeout: float = 10.0): Connect to WebSocket serverdisconnect(): Disconnect from serversend_audio(audio_data: bytes, commit: bool = False): Send audio chunkis_connected: Property to check connection statussession_id: Property to get current session ID
Event Callbacks
on_session_started(msg: SessionStartedMessage): Called when session startson_partial_transcript(msg: PartialTranscriptMessage): Called for partial transcriptson_committed_transcript(msg: CommittedTranscriptMessage): Called for final transcriptson_committed_transcript_with_timestamps(msg: CommittedTranscriptWithTimestampsMessage): Called for final transcripts with timestampson_error(msg: ErrorMessage): Called on errorson_message(msg: TranscriptionMessage): Called for any messageon_connected(): Called when connectedon_disconnected(code: int, reason: str): Called when disconnected
AudioStream
Audio stream handler for microphone capture.
Constructor Parameters
audio_format(AudioFormat): Audio format specificationchunk_size(int): Number of frames per buffer (default: 4096)channels(int): Number of audio channels (default: 1)
Methods
start(callback: Callable[[bytes], None]): Start audio capturestop(): Stop audio captureclose(): Close and cleanup resources
AudioFormat
Enum for supported audio formats:
PCM_8000: 8kHz PCMPCM_16000: 16kHz PCM (recommended)PCM_22050: 22.05kHz PCMPCM_24000: 24kHz PCMPCM_44100: 44.1kHz PCM
CommitStrategy
Enum for commit strategies:
VAD: Voice Activity Detection (automatic)MANUAL: Manual commit
Message Types
SessionStartedMessage
@dataclass
class SessionStartedMessage:
message_type: str
session_id: Optional[str]
raw_data: Dict[str, Any]
PartialTranscriptMessage
@dataclass
class PartialTranscriptMessage:
message_type: str
text: str
raw_data: Dict[str, Any]
CommittedTranscriptMessage
@dataclass
class CommittedTranscriptMessage:
message_type: str
text: str
raw_data: Dict[str, Any]
CommittedTranscriptWithTimestampsMessage
@dataclass
class CommittedTranscriptWithTimestampsMessage:
message_type: str
text: str
words: List[WordTimestamp]
raw_data: Dict[str, Any]
@dataclass
class WordTimestamp:
word: str
start: float
end: float
ErrorMessage
@dataclass
class ErrorMessage:
message_type: str
error: str
raw_data: Dict[str, Any]
Examples
The examples/ directory contains several example scripts:
- simple_example.py: Basic usage demonstration
- stream_from_mic.py: Full-featured microphone streaming with interactive options
- send_audio_file.py: Send pre-recorded WAV file for transcription
Run examples:
python examples/simple_example.py
python examples/stream_from_mic.py
python examples/send_audio_file.py
Supported Languages
The SDK supports multiple languages including:
- English (en)
- Chinese (zh)
- Japanese (ja)
- Korean (ko)
- Spanish (es)
- French (fr)
- German (de)
- Russian (ru)
- Arabic (ar)
- Portuguese (pt)
- And more...
Set language=None for automatic language detection.
Error Handling
The SDK provides comprehensive error handling through callbacks:
def on_error(msg):
print(f"Error type: {msg.message_type}")
print(f"Error message: {msg.error}")
client.on_error = on_error
Common error types:
error: General errorauth_error: Authentication failedquota_exceeded_error: API quota exceededunaccepted_terms: Terms of service not accepted
Requirements
- Python 3.8+
- websocket-client >= 1.6.0
- numpy >= 1.24.0
- PyAudio >= 0.2.13 (for microphone capture)
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and questions:
- Create an issue on GitHub
- Check the examples directory for usage patterns
- Review the API documentation above
Changelog
Version 0.1.0
- Initial release
- WebSocket client implementation
- Audio streaming support
- Event-driven callbacks
- Multiple audio format support
- Word-level timestamps
- Example scripts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file realtime_asr_sdk-0.1.2.tar.gz.
File metadata
- Download URL: realtime_asr_sdk-0.1.2.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08fa124fe7adb4c3f6ed0b94f1c37787e2ed1c2131674776aef6a6e7ab0387bf
|
|
| MD5 |
ab1c473117f688ca82ed7de6af6ab87b
|
|
| BLAKE2b-256 |
654fe944168553a7d108a06adf2da78f18d86c425b95855bd8a01fa8bb623060
|
File details
Details for the file realtime_asr_sdk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: realtime_asr_sdk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f88e971c715febf68e472424f24e0a1d29844d78b14d52c1e61c97d4c4754bff
|
|
| MD5 |
1fe4697526faa508da4cc739f6742880
|
|
| BLAKE2b-256 |
51fa6b1b9074f49a974b24ad80a95d2899e1604efaef882cfe3d34b31a0cb093
|