Skip to main content

Voice interaction capabilities for Model Context Protocol (MCP) servers

Project description

voice-mcp

MCP servers that enable voice interactions between LLMs and users through LiveKit.

Quick Start with Python Package

The easiest way to use voice-mcp is through our Python package:

# Install with pip
pip install livekit-voice-mcp

# Or use with uvx (no installation needed)
uvx livekit-voice-mcp

Configure Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "livekit-voice": {
      "command": "uvx",
      "args": ["livekit-voice-mcp"],
      "env": {
        "LIVEKIT_URL": "wss://your-app.livekit.cloud",
        "LIVEKIT_API_KEY": "your-api-key",
        "LIVEKIT_API_SECRET": "your-api-secret",
        "OPENAI_API_KEY": "your-openai-key"
      }
    }
  }
}

Restart Claude Desktop and you can now use voice commands!

Overview

voice-mcp provides Model Context Protocol (MCP) servers that allow LLMs to communicate via voice, enabling natural spoken conversations with AI assistants.

Architecture

┌─────────────────────┐     ┌──────────────────┐     ┌─────────────────────┐
│   Claude/LLM        │     │  LiveKit Server  │     │  Voice Frontend     │
│   (MCP Client)      │◄────►│  (Port 7880)     │◄────►│  (Port 3001)        │
└─────────────────────┘     └──────────────────┘     └─────────────────────┘
         │                            │
         │                            │
         ▼                            ▼
┌─────────────────────┐     ┌──────────────────┐
│  Voice MCP Server   │     │   Agent.py       │
│  (ask_voice_question│     │  (Voice Logic)   │
│   check_room_status)│     └──────────────────┘
└─────────────────────┘              │
                                     │
                    ┌────────────────┴────────────────┐
                    │                                 │
                    ▼                                 ▼
         ┌──────────────────┐             ┌──────────────────┐
         │  Whisper.cpp     │             │  Kokoro TTS      │
         │  (Port 2022)     │             │  (Port 8880)     │
         │  Local STT       │             │  Local TTS       │
         └──────────────────┘             └──────────────────┘

Features

  • Voice Input/Output: Bidirectional voice communication through LiveKit
  • Speech-to-Text: Local whisper.cpp or OpenAI Whisper API
  • Text-to-Speech: Multiple TTS providers (OpenAI TTS + local Kokoro-FastAPI)
  • Local STT/TTS: Cost-free local speech recognition and voice generation
  • Real-time Streaming: Low-latency voice interactions
  • MCP Integration: Works seamlessly with Claude and other MCP-compatible clients

Installation Options

Option 1: Python Package (Recommended for Users)

# Install globally
pip install livekit-voice-mcp

# Or use without installation
uvx livekit-voice-mcp

# Or use pipx for isolated installation  
pipx install livekit-voice-mcp

Option 2: Container Image

# Pull and run the container
docker pull ghcr.io/mbailey/voice-mcp:latest

# Run with environment variables
docker run -e OPENAI_API_KEY=your_key_here \
  -e VOICE_MCP_DEBUG=true \
  ghcr.io/mbailey/voice-mcp:latest

See CONTAINER.md for detailed container usage instructions.

Option 3: Local Development Setup

# Clone the repository
git clone https://github.com/mbailey/voice-mcp-public.git
cd voice-mcp-public

# Build container image
make build-container

# Or install development environment
make install

Configuration

Python Package Configuration

Set environment variables before running:

export LIVEKIT_URL="wss://your-app.livekit.cloud"
export LIVEKIT_API_KEY="your-api-key"
export LIVEKIT_API_SECRET="your-api-secret"
export OPENAI_API_KEY="your-openai-key"  # For STT/TTS

Local Development Configuration

Copy the example configuration and customize:

cp .env.example .env.local
# Edit .env.local with your settings

Provider Selection

voice-mcp supports multiple STT/TTS providers with smart fallback:

TTS Providers

  • TTS_PROVIDER=auto (default): Try Kokoro → OpenAI → LiveKit
  • TTS_PROVIDER=kokoro: Use only local Kokoro TTS
  • TTS_PROVIDER=openai: Use only OpenAI TTS

STT Configuration

  • Local Whisper: Automatically used when available at http://localhost:2022
  • OpenAI Whisper: Fallback when local whisper is not running

Key Configuration Options

# TTS Provider (auto/kokoro/openai)
TTS_PROVIDER=auto

# Kokoro TTS (local)
KOKORO_URL=http://127.0.0.1:8880
KOKORO_ENABLED=true

# Whisper STT (local)
WHISPER_BASE_URL=http://localhost:2022

# OpenAI (fallback for both STT and TTS)
OPENAI_API_KEY=your_key_here

# LiveKit
LIVEKIT_URL=ws://localhost:7880

Usage

Using the Python Package

Once installed and configured in Claude Desktop, you can use voice commands:

  1. Ask Claude: "Can you help me with voice?"
  2. Claude will use the voice MCP tools to communicate
  3. Speak your questions and hear responses

Available MCP tools:

  • ask_voice_question: Ask a question via voice and get a text response
  • check_room_status: Check active voice rooms and participants

Local Development Usage

  1. Download external repositories:

    mt sync
    
  2. Install and build all dependencies:

    make install
    
  3. Start the development environment:

    make dev
    

This will start:

  • LiveKit server (port 7880)
  • Kokoro TTS (port 8880)
  • Whisper STT (port 2022)
  • Voice assistant frontend (port 3001)

Individual components:

make livekit-server   # Start LiveKit server
make frontend         # Start voice frontend
make kokoro-start     # Start Kokoro TTS
make whisper-start    # Start Whisper STT

Architecture

  • livekit-voice-mcp: MCP server for voice interactions
  • livekit-admin-mcp: Administrative tools for LiveKit management
  • livekit-agent: Python agent handling voice processing
  • kokoro-fastapi: Local TTS server providing OpenAI-compatible API
  • whisper.cpp: Local STT server providing OpenAI-compatible API

Kokoro-FastAPI (Local TTS)

voice-mcp includes Kokoro-FastAPI for cost-free local text-to-speech generation:

  • 70+ Voice Options: Multiple languages and voice styles
  • OpenAI Compatible: Drop-in replacement for OpenAI TTS API
  • Web Interface: Interactive voice testing at http://127.0.0.1:8880/web/
  • Browser Support: Chrome/Chromium recommended (Firefox has streaming limitations)

Kokoro Commands

make kokoro-start     # Start Kokoro TTS service
make kokoro-stop      # Stop Kokoro TTS service  
make kokoro-build     # Build Kokoro container
make test-kokoro      # Test Kokoro functionality

Quick Test

# Generate speech using Kokoro API
curl -X POST http://127.0.0.1:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Kokoro!", "voice": "nova"}' \
  --output test.mp3

Whisper.cpp (Local STT)

voice-mcp includes whisper.cpp for cost-free local speech-to-text:

  • Hardware Optimization: Automatically selects best model for your hardware
  • OpenAI Compatible: Drop-in replacement for OpenAI Whisper API
  • Multiple Models: From tiny to large-v3-turbo
  • GPU Support: CUDA, Metal, and Vulkan acceleration

Whisper Commands

make whisper-build    # Build Whisper container
make whisper-start    # Start Whisper STT service
make whisper-stop     # Stop Whisper STT service

Quick Test

# Test whisper API (OpenAI-compatible)
curl -X POST http://localhost:2022/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav"

Requirements

  • Python 3.8+
  • LiveKit server
  • Podman or Docker (for Kokoro TTS only)
  • Build tools (cmake, make, gcc/g++) for Whisper.cpp
  • OpenAI API key (optional, for cloud fallback)
  • mt command for managing external repos

Development

See TASKS.md for development roadmap and technical tasks.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_mcp-0.1.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_mcp-0.1.1-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file voice_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: voice_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for voice_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 74e824b69aeca231760732b4e3bacfeb1a4152965366dbded5ac0bea9cffcc85
MD5 94dd2aa3d3ac236ed11738dc80248052
BLAKE2b-256 8410325e10ddd38552c8e87010388d9db0c332ddf3d4ffd0fce58a1acdf9d034

See more details on using hashes here.

File details

Details for the file voice_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: voice_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for voice_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bcde09391c74e6860dc14e3ccce9f51b8bdab386036ff611babdc6e688b58885
MD5 a65cde5f1484b6eaeb701c87e53e3d6e
BLAKE2b-256 8289160c6d0e15c49cdab3b0633c067f4ac36b1a5799cd23c9467bcf42828fc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page