Skip to main content

Real-time voice call interface for AI assistants

Project description

Olaf Voice ๐ŸŽ™๏ธ

Real-time voice call interface for AI assistants

Have natural voice conversations with your AI assistant through a sleek, browser-based interface. Think "phone call with your AI" โ€” speak naturally, get instant voice responses, and enjoy ambient background music while you chat.

Demo Python License

โœจ Features

  • ๐ŸŽค Real-time voice interaction โ€” speak and hear responses instantly
  • ๐ŸŽจ Clean, dark UI โ€” minimal, elegant interface designed for voice-first interaction
  • ๐ŸŽต Ambient background music โ€” lo-fi tones that auto-duck when Olaf speaks
  • ๐ŸŽ™๏ธ Dual input modes โ€” Voice Activity Detection (VAD) or Push-to-Talk (PTT)
  • ๐Ÿƒ Running mode โ€” simplified UI with larger controls for hands-free use
  • ๐Ÿ“Š Live waveform visualizer โ€” see who's speaking in real-time
  • ๐Ÿ”Œ OpenAI-compatible APIs โ€” works with OpenAI, OpenClaw proxy, or any compatible endpoint
  • โš™๏ธ Fully configurable โ€” customize voices, models, music, and behavior

๐Ÿ“ธ Screenshots

Main Interface

Clean, centered call button with waveform visualizer and status display.

Running Mode

Simplified interface with larger controls, perfect for hands-free use while exercising or driving.

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install olaf-voice

# Or with uv (recommended)
uv pip install olaf-voice

Basic Usage

# Set your OpenAI API key
export OLAF_VOICE_OPENAI_API_KEY="your-api-key"

# Start the server
olaf-voice

# Open browser to http://localhost:8765

That's it! Click the green call button and start talking.

๐Ÿ› ๏ธ Configuration

Environment Variables

# Server settings
export OLAF_VOICE_HOST="0.0.0.0"
export OLAF_VOICE_PORT="8765"

# OpenAI API
export OLAF_VOICE_OPENAI_API_KEY="sk-..."
export OLAF_VOICE_OPENAI_BASE_URL="https://api.openai.com/v1"  # optional

# Models
export OLAF_VOICE_WHISPER_MODEL="whisper-1"
export OLAF_VOICE_TTS_MODEL="tts-1"
export OLAF_VOICE_TTS_VOICE="alloy"  # alloy, echo, fable, onyx, nova, shimmer
export OLAF_VOICE_AI_MODEL="gpt-4o"

# Audio preferences
export OLAF_VOICE_VAD_ENABLED="true"
export OLAF_VOICE_MUSIC_VOLUME="0.3"

YAML Configuration

Generate an example config file:

olaf-voice --generate-config config.yaml

Edit config.yaml:

# Server
host: 0.0.0.0
port: 8765

# API Keys
openai_api_key: sk-...
openai_base_url: null  # Use default OpenAI, or set custom endpoint

# STT (Speech-to-Text)
whisper_model: whisper-1
whisper_language: null  # Auto-detect, or set 'en', 'es', etc.

# TTS (Text-to-Speech)
tts_model: tts-1
tts_voice: alloy
tts_speed: 1.0

# AI Model
ai_model: gpt-4o
ai_system_prompt: "You are Olaf, a helpful AI assistant..."
ai_max_tokens: 500
ai_temperature: 0.7

# Audio
vad_enabled: true
vad_sensitivity: 0.5
music_volume: 0.3
music_duck_volume: 0.1

Run with config:

olaf-voice --config config.yaml

Command-Line Options

olaf-voice --help

# Start with custom host/port
olaf-voice --host 127.0.0.1 --port 9000

# Enable verbose logging
olaf-voice --verbose

# Use config file
olaf-voice --config /path/to/config.yaml

๐ŸŽฏ Usage Guide

Starting a Call

  1. Open http://localhost:8765 in your browser
  2. Click the green Call button
  3. Grant microphone permissions when prompted
  4. Start talking!

Controls

  • Mute ๐ŸŽค โ€” Temporarily mute your microphone
  • Music ๐ŸŽต โ€” Toggle background ambient music
  • VAD/PTT ๐ŸŽ™๏ธ โ€” Switch between Voice Activity Detection and Push-to-Talk
    • VAD mode (default): Automatically detects when you're speaking
    • PTT mode: Hold SPACE to talk, release to send
  • Running ๐Ÿƒ โ€” Toggle simplified UI for hands-free use
  • Volume slider โ€” Adjust output volume

Keyboard Shortcuts

  • SPACE โ€” Push-to-talk (when in PTT mode)
  • Click controls to toggle features

Running Mode

Perfect for when you're exercising, driving, or want a simplified experience:

  1. Click the Running ๐Ÿƒ button
  2. UI simplifies to just:
    • Large call button
    • Status display
    • Transcript (shows conversation)

Press Running again to return to full interface.

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      WebSocket      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Browser   โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚  FastAPI     โ”‚
โ”‚             โ”‚                      โ”‚  Server      โ”‚
โ”‚ - Mic input โ”‚   Audio (base64)    โ”‚              โ”‚
โ”‚ - Speaker   โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บ โ”‚ - Whisper    โ”‚
โ”‚ - Visualizerโ”‚                      โ”‚ - AI Chat    โ”‚
โ”‚ - Controls  โ”‚ โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚ - TTS        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   Audio + Text      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Flow

  1. Capture: Browser captures microphone audio (WebM format)
  2. Send: Audio sent via WebSocket as base64
  3. Transcribe: Backend uses OpenAI Whisper API to convert speech to text
  4. Think: Text sent to AI model (GPT-4, etc.) for response
  5. Synthesize: AI response converted to speech via TTS API
  6. Play: Audio streamed back to browser and played

Background Music

Ambient lo-fi music is generated client-side using Web Audio API oscillators and filters โ€” no audio files needed! The music automatically "ducks" (reduces volume) when Olaf is speaking, then returns to normal.

๐Ÿงช Development

Setup

# Clone repository
git clone https://github.com/Olafs-World/olaf-voice.git
cd olaf-voice

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=olaf_voice --cov-report=html

Project Structure

olaf-voice/
โ”œโ”€โ”€ src/olaf_voice/
โ”‚   โ”œโ”€โ”€ __init__.py          # Package exports
โ”‚   โ”œโ”€โ”€ __main__.py          # CLI entry point
โ”‚   โ”œโ”€โ”€ config.py            # Configuration management
โ”‚   โ”œโ”€โ”€ server.py            # FastAPI server + WebSocket
โ”‚   โ”œโ”€โ”€ transcribe.py        # Whisper STT integration
โ”‚   โ”œโ”€โ”€ tts.py              # OpenAI TTS integration
โ”‚   โ”œโ”€โ”€ ai.py               # AI chat integration
โ”‚   โ””โ”€โ”€ static/
โ”‚       โ”œโ”€โ”€ index.html       # Main UI
โ”‚       โ”œโ”€โ”€ style.css        # Styling
โ”‚       โ””โ”€โ”€ app.js          # Client-side logic
โ”œโ”€โ”€ tests/                   # Pytest tests
โ”œโ”€โ”€ pyproject.toml          # Package metadata
โ””โ”€โ”€ README.md

Running Tests

# Run all tests
pytest

# Run specific test file
pytest tests/test_config.py

# Run with coverage
pytest --cov=olaf_voice

# Verbose output
pytest -v

Code Quality

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

# Type check
mypy src/

๐Ÿ› Troubleshooting

Microphone not working

  1. Check browser permissions โ€” look for ๐ŸŽค icon in address bar
  2. Ensure HTTPS or localhost (mic requires secure context)
  3. Try a different browser (Chrome/Edge recommended)

WebSocket connection fails

  1. Check firewall settings
  2. Verify port 8765 (or custom port) is not in use
  3. Check browser console for errors

No audio playback

  1. Check volume slider in the app
  2. Verify system volume is not muted
  3. Check browser's autoplay policies (click UI first to enable audio)

API errors

  1. Verify OLAF_VOICE_OPENAI_API_KEY is set correctly
  2. Check API quota and billing status
  3. Enable verbose logging: olaf-voice --verbose

High latency

  • Use tts-1 model (faster than tts-1-hd)
  • Reduce ai_max_tokens for shorter responses
  • Use a faster AI model (e.g., gpt-4o-mini)

๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

MIT License - see LICENSE for details.

๐Ÿ™ Credits

Built with:

๐Ÿ“ฌ Support


Made with โค๏ธ for seamless AI conversations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olaf_voice-0.1.0.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

olaf_voice-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file olaf_voice-0.1.0.tar.gz.

File metadata

  • Download URL: olaf_voice-0.1.0.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for olaf_voice-0.1.0.tar.gz
Algorithm Hash digest
SHA256 241e3ddb522b69c27d1abbbf8aee7f882f8769ac3c2e4e77008b8275b96ac53b
MD5 64eddf89f92d650fc6d82a6681b54e68
BLAKE2b-256 5252068a1ca5a7e5397f605ef55cf333c042a13e5fa2983ad15a220e2d08cc9c

See more details on using hashes here.

File details

Details for the file olaf_voice-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: olaf_voice-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for olaf_voice-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72b7e78a16b7a37370e1e766e6c8e5c6ce434ed4947aa3b152c02d24047854f7
MD5 419e8a85ecac3f25b789aa71e8fd6278
BLAKE2b-256 c347f0012b22b8882fee1f9053d7a6896955f6eeaa16502258ce0400ac192685

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page