A production-ready Python framework for building real-time AI voice chat applications using OpenAI s Realtime API
Project description
VoiceChatEngine
A high-performance Python framework for OpenAI's Realtime API, featuring a unique dual-lane architecture that balances ultra-low latency with rich functionality.
🎯 Overview
VoiceChatEngine provides a modern, production-ready interface for building real-time voice applications with OpenAI's Realtime API. The framework's innovative dual-lane architecture allows developers to choose between maximum performance (Fast Lane) or full features (Big Lane).
Key Features
- 🚀 Ultra-Low Latency: < 50ms round-trip in Fast Lane mode
- 🎙️ Real-time Voice Processing: Direct hardware audio capture with minimal overhead
- 🔄 Dual-Lane Architecture: Choose between performance and features
- 🎯 Client-side VAD: Energy-based voice activity detection
- 🔊 Audio Playback: Built-in audio output with < 10ms latency
- 📊 Comprehensive Metrics: Performance monitoring and usage tracking
- 🛡️ Production Ready: Robust error handling and automatic reconnection
🏗️ Architecture
Fast Lane (Default)
Optimized for ultra-low latency voice interactions:
- Direct WebSocket connection
- Zero-copy audio path
- Minimal abstraction layers
- Client-side VAD only
- < 50ms total latency
Big Lane (Coming Soon)
Full-featured implementation with:
- Multi-provider support
- Audio pipeline processing
- Event-driven architecture
- Advanced features (transcription, functions)
- Provider failover
🚀 Quick Start
Installation
pip install realtimevoiceapi
Basic Usage
import asyncio
from voicechatengine import VoiceChatEngine
async def main():
# Create engine
engine = VoiceChatEngine(api_key="your-openai-api-key")
# Set up callbacks
engine.on_text_response = lambda text: print(f"AI: {text}")
# Connect and start
async with engine:
await engine.start_listening()
# Keep running
await asyncio.sleep(60)
asyncio.run(main())
Advanced Example
import asyncio
from voicechatengine import VoiceChatEngine, VoiceEngineConfig
async def main():
# Configure engine
config = VoiceEngineConfig(
api_key="your-api-key",
mode="fast",
voice="alloy",
vad_enabled=True,
vad_threshold=0.02,
latency_mode="ultra_low"
)
engine = VoiceChatEngine(config=config)
# Set up comprehensive callbacks
engine.on_audio_response = lambda audio: print(f"Received {len(audio)} bytes")
engine.on_text_response = lambda text: print(f"AI: {text}")
engine.on_error = lambda error: print(f"Error: {error}")
engine.on_response_done = lambda: print("Response complete")
await engine.connect()
# Example: Text to speech
await engine.send_text("Hello, how are you today?")
# Example: Start listening for voice input
await engine.start_listening()
# Keep running for 5 minutes
await asyncio.sleep(300)
await engine.disconnect()
asyncio.run(main())
📖 API Reference
VoiceEngine
The main interface for voice interactions.
Methods
connect(retry_count=3)- Connect to OpenAI Realtime APIdisconnect()- Disconnect from APIstart_listening()- Start capturing audio inputstop_listening()- Stop capturing audiosend_text(text)- Send text messagesend_audio(audio_bytes)- Send audio datainterrupt()- Interrupt current AI responseget_metrics()- Get performance metricsget_usage()- Get usage statistics
Properties
is_connected- Check connection statusis_listening- Check if actively listeningon_audio_response- Callback for audio responseson_text_response- Callback for text responseson_error- Callback for errorson_response_done- Callback when response completes
Configuration
VoiceEngineConfig(
api_key: str, # Required: OpenAI API key
mode: "fast" | "big" = "fast", # Engine mode
voice: str = "alloy", # Voice selection
sample_rate: int = 24000, # Audio sample rate
vad_enabled: bool = True, # Enable voice activity detection
vad_threshold: float = 0.02, # VAD sensitivity
latency_mode: str = "balanced", # "ultra_low" | "balanced" | "quality"
)
🎯 Use Cases
Voice Assistant
engine = VoiceEngine.create_simple(api_key="...")
engine.on_text_response = lambda text: print(f"Assistant: {text}")
await engine.connect()
await engine.start_listening()
Real-time Translation
config = VoiceEngineConfig(
api_key="...",
voice="shimmer",
language="es" # Spanish
)
engine = VoiceEngine(config=config)
Interactive Voice Response (IVR)
engine = VoiceEngine(api_key="...", mode="fast")
engine.on_text_response = handle_user_input
engine.on_function_call = execute_action
🔧 Advanced Features
Voice Activity Detection (VAD)
Built-in client-side VAD for efficient audio streaming:
config = VoiceEngineConfig(
vad_enabled=True,
vad_threshold=0.02, # Energy threshold
vad_speech_start_ms=100, # Speech detection delay
vad_speech_end_ms=500 # Silence detection delay
)
Performance Metrics
Monitor real-time performance:
metrics = engine.get_metrics()
print(f"Latency: {metrics['audio']['capture_rate']} chunks/sec")
print(f"Uptime: {metrics['uptime']} seconds")
Cost Tracking
Track API usage and costs:
usage = await engine.get_usage()
cost = await engine.estimate_cost()
print(f"Total cost: ${cost.total:.2f}")
🏗️ Architecture Details
Component Overview
- Audio Manager: Unified audio interface for capture and playback
- Stream Manager: WebSocket connection management
- VAD Detector: Real-time voice activity detection
- Strategy Pattern: Pluggable implementations for different use cases
Performance Characteristics
Fast Lane Performance:
- Audio capture to API: < 10ms
- API to audio playback: < 10ms
- Total round-trip: < 50ms
- CPU usage: < 5%
- Memory: < 50MB
🛠️ Development
Requirements
- Python 3.8+
sounddevicefor audio I/Owebsocketsfor API connectionnumpyfor audio processing
Testing
# Run smoke tests
python -m pytest tests/smoke_tests/
# Run specific test
python -m realtimevoiceapi.smoke_tests.test_08_fast_lane_simple_demo
Contributing
- Fork the repository
- Create a feature branch
- Implement your changes
- Add tests
- Submit a pull request
📝 Best Practices
- Always handle errors: Set up error callbacks for production use
- Monitor metrics: Track performance in production
- Use appropriate mode: Fast lane for conversations, Big lane for processing
- Configure VAD: Tune VAD parameters for your environment
- Test audio devices: Verify device compatibility before deployment
🚧 Roadmap
- Big Lane implementation
- Multi-provider support (Anthropic, Google)
- Audio effects pipeline
- Advanced VAD algorithms
- Recording and playback features
- Conversation persistence
- Web/mobile SDKs
📄 License
MIT License - see LICENSE file for details
🙏 Acknowledgments
Built with inspiration from modern real-time systems and the excellent OpenAI Realtime API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicechatengine-0.0.3.tar.gz.
File metadata
- Download URL: voicechatengine-0.0.3.tar.gz
- Upload date:
- Size: 169.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff2bf44ffc7ce461bd5ef80fc3f1656616d1d3b5a4270ac4f494d5b31f3c4c42
|
|
| MD5 |
14199cb3a1dfa10fdd177102d1278410
|
|
| BLAKE2b-256 |
b04a81aa81c0c6b7bb812203b6179ee75c8c91d8d46fdaf5823fa97da91e9384
|
File details
Details for the file voicechatengine-0.0.3-py3-none-any.whl.
File metadata
- Download URL: voicechatengine-0.0.3-py3-none-any.whl
- Upload date:
- Size: 204.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6620b44423222d8823dde19e19956f5c435813e02ba71b5cbb11b61779363f97
|
|
| MD5 |
62877dec3f4b9055e41f9696787cda33
|
|
| BLAKE2b-256 |
7023cfa9e0b20c2d93629b7644e1e8aa02920fe888f959db4c2c8556a3968a4e
|