Real-time voice call interface for AI assistants
Project description
Olaf Voice ๐๏ธ
Real-time voice call interface for AI assistants
Have natural voice conversations with your AI assistant through a sleek, browser-based interface. Think "phone call with your AI" โ speak naturally, get instant voice responses, and enjoy ambient background music while you chat.
โจ Features
- ๐ค Real-time voice interaction โ speak and hear responses instantly
- ๐จ Clean, dark UI โ minimal, elegant interface designed for voice-first interaction
- ๐ต Ambient background music โ lo-fi tones that auto-duck when Olaf speaks
- ๐๏ธ Dual input modes โ Voice Activity Detection (VAD) or Push-to-Talk (PTT)
- ๐ Running mode โ simplified UI with larger controls for hands-free use
- ๐ Live waveform visualizer โ see who's speaking in real-time
- ๐ OpenAI-compatible APIs โ works with OpenAI, OpenClaw proxy, or any compatible endpoint
- โ๏ธ Fully configurable โ customize voices, models, music, and behavior
๐ธ Screenshots
Main Interface
Clean, centered call button with waveform visualizer and status display.
Running Mode
Simplified interface with larger controls, perfect for hands-free use while exercising or driving.
๐ Quick Start
Installation
# Install from PyPI
pip install olaf-voice
# Or with uv (recommended)
uv pip install olaf-voice
Basic Usage
# Set your OpenAI API key
export OLAF_VOICE_OPENAI_API_KEY="your-api-key"
# Start the server
olaf-voice
# Open browser to http://localhost:8765
That's it! Click the green call button and start talking.
๐ ๏ธ Configuration
Environment Variables
# Server settings
export OLAF_VOICE_HOST="0.0.0.0"
export OLAF_VOICE_PORT="8765"
# OpenAI API
export OLAF_VOICE_OPENAI_API_KEY="sk-..."
export OLAF_VOICE_OPENAI_BASE_URL="https://api.openai.com/v1" # optional
# Models
export OLAF_VOICE_WHISPER_MODEL="whisper-1"
export OLAF_VOICE_TTS_MODEL="tts-1"
export OLAF_VOICE_TTS_VOICE="alloy" # alloy, echo, fable, onyx, nova, shimmer
export OLAF_VOICE_AI_MODEL="gpt-4o"
# Audio preferences
export OLAF_VOICE_VAD_ENABLED="true"
export OLAF_VOICE_MUSIC_VOLUME="0.3"
YAML Configuration
Generate an example config file:
olaf-voice --generate-config config.yaml
Edit config.yaml:
# Server
host: 0.0.0.0
port: 8765
# API Keys
openai_api_key: sk-...
openai_base_url: null # Use default OpenAI, or set custom endpoint
# STT (Speech-to-Text)
whisper_model: whisper-1
whisper_language: null # Auto-detect, or set 'en', 'es', etc.
# TTS (Text-to-Speech)
tts_model: tts-1
tts_voice: alloy
tts_speed: 1.0
# AI Model
ai_model: gpt-4o
ai_system_prompt: "You are Olaf, a helpful AI assistant..."
ai_max_tokens: 500
ai_temperature: 0.7
# Audio
vad_enabled: true
vad_sensitivity: 0.5
music_volume: 0.3
music_duck_volume: 0.1
Run with config:
olaf-voice --config config.yaml
Command-Line Options
olaf-voice --help
# Start with custom host/port
olaf-voice --host 127.0.0.1 --port 9000
# Enable verbose logging
olaf-voice --verbose
# Use config file
olaf-voice --config /path/to/config.yaml
๐ฏ Usage Guide
Starting a Call
- Open http://localhost:8765 in your browser
- Click the green Call button
- Grant microphone permissions when prompted
- Start talking!
Controls
- Mute ๐ค โ Temporarily mute your microphone
- Music ๐ต โ Toggle background ambient music
- VAD/PTT ๐๏ธ โ Switch between Voice Activity Detection and Push-to-Talk
- VAD mode (default): Automatically detects when you're speaking
- PTT mode: Hold SPACE to talk, release to send
- Running ๐ โ Toggle simplified UI for hands-free use
- Volume slider โ Adjust output volume
Keyboard Shortcuts
- SPACE โ Push-to-talk (when in PTT mode)
- Click controls to toggle features
Running Mode
Perfect for when you're exercising, driving, or want a simplified experience:
- Click the Running ๐ button
- UI simplifies to just:
- Large call button
- Status display
- Transcript (shows conversation)
Press Running again to return to full interface.
๐๏ธ Architecture
โโโโโโโโโโโโโโโ WebSocket โโโโโโโโโโโโโโโโ
โ Browser โ โโโโโโโโโโโโโโโโโโโบ โ FastAPI โ
โ โ โ Server โ
โ - Mic input โ Audio (base64) โ โ
โ - Speaker โ โโโโโโโโโโโโโโโโโโโบ โ - Whisper โ
โ - Visualizerโ โ - AI Chat โ
โ - Controls โ โโโโโโโโโโโโโโโโโโโ โ - TTS โ
โโโโโโโโโโโโโโโ Audio + Text โโโโโโโโโโโโโโโโ
Flow
- Capture: Browser captures microphone audio (WebM format)
- Send: Audio sent via WebSocket as base64
- Transcribe: Backend uses OpenAI Whisper API to convert speech to text
- Think: Text sent to AI model (GPT-4, etc.) for response
- Synthesize: AI response converted to speech via TTS API
- Play: Audio streamed back to browser and played
Background Music
Ambient lo-fi music is generated client-side using Web Audio API oscillators and filters โ no audio files needed! The music automatically "ducks" (reduces volume) when Olaf is speaking, then returns to normal.
๐งช Development
Setup
# Clone repository
git clone https://github.com/Olafs-World/olaf-voice.git
cd olaf-voice
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=olaf_voice --cov-report=html
Project Structure
olaf-voice/
โโโ src/olaf_voice/
โ โโโ __init__.py # Package exports
โ โโโ __main__.py # CLI entry point
โ โโโ config.py # Configuration management
โ โโโ server.py # FastAPI server + WebSocket
โ โโโ transcribe.py # Whisper STT integration
โ โโโ tts.py # OpenAI TTS integration
โ โโโ ai.py # AI chat integration
โ โโโ static/
โ โโโ index.html # Main UI
โ โโโ style.css # Styling
โ โโโ app.js # Client-side logic
โโโ tests/ # Pytest tests
โโโ pyproject.toml # Package metadata
โโโ README.md
Running Tests
# Run all tests
pytest
# Run specific test file
pytest tests/test_config.py
# Run with coverage
pytest --cov=olaf_voice
# Verbose output
pytest -v
Code Quality
# Format code
black src/ tests/
# Lint
ruff check src/ tests/
# Type check
mypy src/
๐ Troubleshooting
Microphone not working
- Check browser permissions โ look for ๐ค icon in address bar
- Ensure HTTPS or localhost (mic requires secure context)
- Try a different browser (Chrome/Edge recommended)
WebSocket connection fails
- Check firewall settings
- Verify port 8765 (or custom port) is not in use
- Check browser console for errors
No audio playback
- Check volume slider in the app
- Verify system volume is not muted
- Check browser's autoplay policies (click UI first to enable audio)
API errors
- Verify
OLAF_VOICE_OPENAI_API_KEYis set correctly - Check API quota and billing status
- Enable verbose logging:
olaf-voice --verbose
High latency
- Use
tts-1model (faster thantts-1-hd) - Reduce
ai_max_tokensfor shorter responses - Use a faster AI model (e.g.,
gpt-4o-mini)
๐ค Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
MIT License - see LICENSE for details.
๐ Credits
Built with:
- FastAPI โ Modern web framework
- OpenAI APIs โ Whisper, TTS, GPT
- Web Audio API โ Browser audio processing
๐ฌ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Made with โค๏ธ for seamless AI conversations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file olaf_voice-0.1.0.tar.gz.
File metadata
- Download URL: olaf_voice-0.1.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
241e3ddb522b69c27d1abbbf8aee7f882f8769ac3c2e4e77008b8275b96ac53b
|
|
| MD5 |
64eddf89f92d650fc6d82a6681b54e68
|
|
| BLAKE2b-256 |
5252068a1ca5a7e5397f605ef55cf333c042a13e5fa2983ad15a220e2d08cc9c
|
File details
Details for the file olaf_voice-0.1.0-py3-none-any.whl.
File metadata
- Download URL: olaf_voice-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72b7e78a16b7a37370e1e766e6c8e5c6ce434ed4947aa3b152c02d24047854f7
|
|
| MD5 |
419e8a85ecac3f25b789aa71e8fd6278
|
|
| BLAKE2b-256 |
c347f0012b22b8882fee1f9053d7a6896955f6eeaa16502258ce0400ac192685
|