A production-ready Python framework for building real-time AI voice chat applications using OpenAI s Realtime API

These details have not been verified by PyPI

Project description

VoiceChatEngine

A high-performance Python framework for OpenAI's Realtime API, featuring a unique dual-lane architecture that balances ultra-low latency with rich functionality.

🎯 Overview

VoiceChatEngine provides a modern, production-ready interface for building real-time voice applications with OpenAI's Realtime API. The framework's innovative dual-lane architecture allows developers to choose between maximum performance (Fast Lane) or full features (Big Lane).

Key Features

🚀 Ultra-Low Latency: < 50ms round-trip in Fast Lane mode
🎙️ Real-time Voice Processing: Direct hardware audio capture with minimal overhead
🔄 Dual-Lane Architecture: Choose between performance and features
🎯 Client-side VAD: Energy-based voice activity detection
🔊 Audio Playback: Built-in audio output with < 10ms latency
📊 Comprehensive Metrics: Performance monitoring and usage tracking
🛡️ Production Ready: Robust error handling and automatic reconnection

🏗️ Architecture

Fast Lane (Default)

Optimized for ultra-low latency voice interactions:

Direct WebSocket connection
Zero-copy audio path
Minimal abstraction layers
Client-side VAD only
< 50ms total latency

Big Lane (Coming Soon)

Full-featured implementation with:

Multi-provider support
Audio pipeline processing
Event-driven architecture
Advanced features (transcription, functions)
Provider failover

🚀 Quick Start

Installation

pip install realtimevoiceapi

Basic Usage

import asyncio
from voicechatengine import VoiceChatEngine

async def main():
    # Create engine
    engine = VoiceChatEngine(api_key="your-openai-api-key")
    
    # Set up callbacks
    engine.on_text_response = lambda text: print(f"AI: {text}")
    
    # Connect and start
    async with engine:
        await engine.start_listening()
        
        # Keep running
        await asyncio.sleep(60)

asyncio.run(main())

Advanced Example

import asyncio
from voicechatengine import VoiceChatEngine, VoiceEngineConfig


async def main():
    # Configure engine
    config = VoiceEngineConfig(
        api_key="your-api-key",
        mode="fast",
        voice="alloy",
        vad_enabled=True,
        vad_threshold=0.02,
        latency_mode="ultra_low"
    )
    
    engine = VoiceChatEngine(config=config)
    
    # Set up comprehensive callbacks
    engine.on_audio_response = lambda audio: print(f"Received {len(audio)} bytes")
    engine.on_text_response = lambda text: print(f"AI: {text}")
    engine.on_error = lambda error: print(f"Error: {error}")
    engine.on_response_done = lambda: print("Response complete")
    
    await engine.connect()
    
    # Example: Text to speech
    await engine.send_text("Hello, how are you today?")
    
    # Example: Start listening for voice input
    await engine.start_listening()
    
    # Keep running for 5 minutes
    await asyncio.sleep(300)
    
    await engine.disconnect()

asyncio.run(main())

📖 API Reference

VoiceEngine

The main interface for voice interactions.

Methods

connect(retry_count=3) - Connect to OpenAI Realtime API
disconnect() - Disconnect from API
start_listening() - Start capturing audio input
stop_listening() - Stop capturing audio
send_text(text) - Send text message
send_audio(audio_bytes) - Send audio data
interrupt() - Interrupt current AI response
get_metrics() - Get performance metrics
get_usage() - Get usage statistics

Properties

is_connected - Check connection status
is_listening - Check if actively listening
on_audio_response - Callback for audio responses
on_text_response - Callback for text responses
on_error - Callback for errors
on_response_done - Callback when response completes

Configuration

VoiceEngineConfig(
    api_key: str,                    # Required: OpenAI API key
    mode: "fast" | "big" = "fast",   # Engine mode
    voice: str = "alloy",            # Voice selection
    sample_rate: int = 24000,        # Audio sample rate
    vad_enabled: bool = True,        # Enable voice activity detection
    vad_threshold: float = 0.02,     # VAD sensitivity
    latency_mode: str = "balanced",  # "ultra_low" | "balanced" | "quality"
)

🎯 Use Cases

Voice Assistant

engine = VoiceEngine.create_simple(api_key="...")
engine.on_text_response = lambda text: print(f"Assistant: {text}")
await engine.connect()
await engine.start_listening()

Real-time Translation

config = VoiceEngineConfig(
    api_key="...",
    voice="shimmer",
    language="es"  # Spanish
)
engine = VoiceEngine(config=config)

Interactive Voice Response (IVR)

engine = VoiceEngine(api_key="...", mode="fast")
engine.on_text_response = handle_user_input
engine.on_function_call = execute_action

🔧 Advanced Features

Voice Activity Detection (VAD)

Built-in client-side VAD for efficient audio streaming:

config = VoiceEngineConfig(
    vad_enabled=True,
    vad_threshold=0.02,      # Energy threshold
    vad_speech_start_ms=100, # Speech detection delay
    vad_speech_end_ms=500    # Silence detection delay
)

Performance Metrics

Monitor real-time performance:

metrics = engine.get_metrics()
print(f"Latency: {metrics['audio']['capture_rate']} chunks/sec")
print(f"Uptime: {metrics['uptime']} seconds")

Cost Tracking

Track API usage and costs:

usage = await engine.get_usage()
cost = await engine.estimate_cost()
print(f"Total cost: ${cost.total:.2f}")

🏗️ Architecture Details

Component Overview

Audio Manager: Unified audio interface for capture and playback
Stream Manager: WebSocket connection management
VAD Detector: Real-time voice activity detection
Strategy Pattern: Pluggable implementations for different use cases

Performance Characteristics

Fast Lane Performance:

Audio capture to API: < 10ms
API to audio playback: < 10ms
Total round-trip: < 50ms
CPU usage: < 5%
Memory: < 50MB

🛠️ Development

Requirements

Python 3.8+
sounddevice for audio I/O
websockets for API connection
numpy for audio processing

Testing

# Run smoke tests
python -m pytest tests/smoke_tests/

# Run specific test
python -m realtimevoiceapi.smoke_tests.test_08_fast_lane_simple_demo

Contributing

Fork the repository
Create a feature branch
Implement your changes
Add tests
Submit a pull request

📝 Best Practices

Always handle errors: Set up error callbacks for production use
Monitor metrics: Track performance in production
Use appropriate mode: Fast lane for conversations, Big lane for processing
Configure VAD: Tune VAD parameters for your environment
Test audio devices: Verify device compatibility before deployment

🚧 Roadmap

Big Lane implementation
Multi-provider support (Anthropic, Google)
Audio effects pipeline
Advanced VAD algorithms
Recording and playback features
Conversation persistence
Web/mobile SDKs

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built with inspiration from modern real-time systems and the excellent OpenAI Realtime API.

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.0.3

Jul 25, 2025

0.0.1

Jul 25, 2025

0.0.0

Jul 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicechatengine-0.0.3.tar.gz (169.7 kB view details)

Uploaded Jul 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voicechatengine-0.0.3-py3-none-any.whl (204.4 kB view details)

Uploaded Jul 25, 2025 Python 3

File details

Details for the file voicechatengine-0.0.3.tar.gz.

File metadata

Download URL: voicechatengine-0.0.3.tar.gz
Upload date: Jul 25, 2025
Size: 169.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for voicechatengine-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`ff2bf44ffc7ce461bd5ef80fc3f1656616d1d3b5a4270ac4f494d5b31f3c4c42`
MD5	`14199cb3a1dfa10fdd177102d1278410`
BLAKE2b-256	`b04a81aa81c0c6b7bb812203b6179ee75c8c91d8d46fdaf5823fa97da91e9384`

See more details on using hashes here.

File details

Details for the file voicechatengine-0.0.3-py3-none-any.whl.

File metadata

Download URL: voicechatengine-0.0.3-py3-none-any.whl
Upload date: Jul 25, 2025
Size: 204.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for voicechatengine-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6620b44423222d8823dde19e19956f5c435813e02ba71b5cbb11b61779363f97`
MD5	`62877dec3f4b9055e41f9696787cda33`
BLAKE2b-256	`7023cfa9e0b20c2d93629b7644e1e8aa02920fe888f959db4c2c8556a3968a4e`

See more details on using hashes here.

voicechatengine 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

VoiceChatEngine

🎯 Overview

Key Features

🏗️ Architecture

Fast Lane (Default)

Big Lane (Coming Soon)

🚀 Quick Start

Installation

Basic Usage

Advanced Example

📖 API Reference

VoiceEngine

Methods

Properties

Configuration

🎯 Use Cases

Voice Assistant

Real-time Translation

Interactive Voice Response (IVR)

🔧 Advanced Features

Voice Activity Detection (VAD)

Performance Metrics

Cost Tracking

🏗️ Architecture Details

Component Overview

Performance Characteristics

🛠️ Development

Requirements

Testing

Contributing

📝 Best Practices

🚧 Roadmap

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes