Skip to main content

A production-ready Python framework for building real-time AI voice chat applications using OpenAI s Realtime API

Project description

VoiceChatEngine

A high-performance Python framework for OpenAI's Realtime API, featuring a unique dual-lane architecture that balances ultra-low latency with rich functionality.

🎯 Overview

VoiceChatEngine provides a modern, production-ready interface for building real-time voice applications with OpenAI's Realtime API. The framework's innovative dual-lane architecture allows developers to choose between maximum performance (Fast Lane) or full features (Big Lane).

Key Features

  • 🚀 Ultra-Low Latency: < 50ms round-trip in Fast Lane mode
  • 🎙️ Real-time Voice Processing: Direct hardware audio capture with minimal overhead
  • 🔄 Dual-Lane Architecture: Choose between performance and features
  • 🎯 Client-side VAD: Energy-based voice activity detection
  • 🔊 Audio Playback: Built-in audio output with < 10ms latency
  • 📊 Comprehensive Metrics: Performance monitoring and usage tracking
  • 🛡️ Production Ready: Robust error handling and automatic reconnection

🏗️ Architecture

Fast Lane (Default)

Optimized for ultra-low latency voice interactions:

  • Direct WebSocket connection
  • Zero-copy audio path
  • Minimal abstraction layers
  • Client-side VAD only
  • < 50ms total latency

Big Lane (Coming Soon)

Full-featured implementation with:

  • Multi-provider support
  • Audio pipeline processing
  • Event-driven architecture
  • Advanced features (transcription, functions)
  • Provider failover

🚀 Quick Start

Installation

pip install realtimevoiceapi

Basic Usage

import asyncio
from voicechatengine import VoiceChatEngine

async def main():
    # Create engine
    engine = VoiceChatEngine(api_key="your-openai-api-key")
    
    # Set up callbacks
    engine.on_text_response = lambda text: print(f"AI: {text}")
    
    # Connect and start
    async with engine:
        await engine.start_listening()
        
        # Keep running
        await asyncio.sleep(60)

asyncio.run(main())

Advanced Example

import asyncio
from voicechatengine import VoiceChatEngine, VoiceEngineConfig


async def main():
    # Configure engine
    config = VoiceEngineConfig(
        api_key="your-api-key",
        mode="fast",
        voice="alloy",
        vad_enabled=True,
        vad_threshold=0.02,
        latency_mode="ultra_low"
    )
    
    engine = VoiceChatEngine(config=config)
    
    # Set up comprehensive callbacks
    engine.on_audio_response = lambda audio: print(f"Received {len(audio)} bytes")
    engine.on_text_response = lambda text: print(f"AI: {text}")
    engine.on_error = lambda error: print(f"Error: {error}")
    engine.on_response_done = lambda: print("Response complete")
    
    await engine.connect()
    
    # Example: Text to speech
    await engine.send_text("Hello, how are you today?")
    
    # Example: Start listening for voice input
    await engine.start_listening()
    
    # Keep running for 5 minutes
    await asyncio.sleep(300)
    
    await engine.disconnect()

asyncio.run(main())

📖 API Reference

VoiceEngine

The main interface for voice interactions.

Methods

  • connect(retry_count=3) - Connect to OpenAI Realtime API
  • disconnect() - Disconnect from API
  • start_listening() - Start capturing audio input
  • stop_listening() - Stop capturing audio
  • send_text(text) - Send text message
  • send_audio(audio_bytes) - Send audio data
  • interrupt() - Interrupt current AI response
  • get_metrics() - Get performance metrics
  • get_usage() - Get usage statistics

Properties

  • is_connected - Check connection status
  • is_listening - Check if actively listening
  • on_audio_response - Callback for audio responses
  • on_text_response - Callback for text responses
  • on_error - Callback for errors
  • on_response_done - Callback when response completes

Configuration

VoiceEngineConfig(
    api_key: str,                    # Required: OpenAI API key
    mode: "fast" | "big" = "fast",   # Engine mode
    voice: str = "alloy",            # Voice selection
    sample_rate: int = 24000,        # Audio sample rate
    vad_enabled: bool = True,        # Enable voice activity detection
    vad_threshold: float = 0.02,     # VAD sensitivity
    latency_mode: str = "balanced",  # "ultra_low" | "balanced" | "quality"
)

🎯 Use Cases

Voice Assistant

engine = VoiceEngine.create_simple(api_key="...")
engine.on_text_response = lambda text: print(f"Assistant: {text}")
await engine.connect()
await engine.start_listening()

Real-time Translation

config = VoiceEngineConfig(
    api_key="...",
    voice="shimmer",
    language="es"  # Spanish
)
engine = VoiceEngine(config=config)

Interactive Voice Response (IVR)

engine = VoiceEngine(api_key="...", mode="fast")
engine.on_text_response = handle_user_input
engine.on_function_call = execute_action

🔧 Advanced Features

Voice Activity Detection (VAD)

Built-in client-side VAD for efficient audio streaming:

config = VoiceEngineConfig(
    vad_enabled=True,
    vad_threshold=0.02,      # Energy threshold
    vad_speech_start_ms=100, # Speech detection delay
    vad_speech_end_ms=500    # Silence detection delay
)

Performance Metrics

Monitor real-time performance:

metrics = engine.get_metrics()
print(f"Latency: {metrics['audio']['capture_rate']} chunks/sec")
print(f"Uptime: {metrics['uptime']} seconds")

Cost Tracking

Track API usage and costs:

usage = await engine.get_usage()
cost = await engine.estimate_cost()
print(f"Total cost: ${cost.total:.2f}")

🏗️ Architecture Details

Component Overview

  • Audio Manager: Unified audio interface for capture and playback
  • Stream Manager: WebSocket connection management
  • VAD Detector: Real-time voice activity detection
  • Strategy Pattern: Pluggable implementations for different use cases

Performance Characteristics

Fast Lane Performance:

  • Audio capture to API: < 10ms
  • API to audio playback: < 10ms
  • Total round-trip: < 50ms
  • CPU usage: < 5%
  • Memory: < 50MB

🛠️ Development

Requirements

  • Python 3.8+
  • sounddevice for audio I/O
  • websockets for API connection
  • numpy for audio processing

Testing

# Run smoke tests
python -m pytest tests/smoke_tests/

# Run specific test
python -m realtimevoiceapi.smoke_tests.test_08_fast_lane_simple_demo

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement your changes
  4. Add tests
  5. Submit a pull request

📝 Best Practices

  1. Always handle errors: Set up error callbacks for production use
  2. Monitor metrics: Track performance in production
  3. Use appropriate mode: Fast lane for conversations, Big lane for processing
  4. Configure VAD: Tune VAD parameters for your environment
  5. Test audio devices: Verify device compatibility before deployment

🚧 Roadmap

  • Big Lane implementation
  • Multi-provider support (Anthropic, Google)
  • Audio effects pipeline
  • Advanced VAD algorithms
  • Recording and playback features
  • Conversation persistence
  • Web/mobile SDKs

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Built with inspiration from modern real-time systems and the excellent OpenAI Realtime API.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicechatengine-0.0.3.tar.gz (169.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicechatengine-0.0.3-py3-none-any.whl (204.4 kB view details)

Uploaded Python 3

File details

Details for the file voicechatengine-0.0.3.tar.gz.

File metadata

  • Download URL: voicechatengine-0.0.3.tar.gz
  • Upload date:
  • Size: 169.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for voicechatengine-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ff2bf44ffc7ce461bd5ef80fc3f1656616d1d3b5a4270ac4f494d5b31f3c4c42
MD5 14199cb3a1dfa10fdd177102d1278410
BLAKE2b-256 b04a81aa81c0c6b7bb812203b6179ee75c8c91d8d46fdaf5823fa97da91e9384

See more details on using hashes here.

File details

Details for the file voicechatengine-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for voicechatengine-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6620b44423222d8823dde19e19956f5c435813e02ba71b5cbb11b61779363f97
MD5 62877dec3f4b9055e41f9696787cda33
BLAKE2b-256 7023cfa9e0b20c2d93629b7644e1e8aa02920fe888f959db4c2c8556a3968a4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page