Skip to main content

Voice interaction capabilities for Model Context Protocol (MCP) servers

Project description

voice-mcp

MCP servers that enable voice interactions between LLMs and users through LiveKit.

Quick Start with Python Package

The easiest way to use voice-mcp is through our Python package:

# Install with pip
pip install livekit-voice-mcp

# Or use with uvx (no installation needed)
uvx livekit-voice-mcp

Configure Claude Desktop

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "livekit-voice": {
      "command": "uvx",
      "args": ["livekit-voice-mcp"],
      "env": {
        "LIVEKIT_URL": "wss://your-app.livekit.cloud",
        "LIVEKIT_API_KEY": "your-api-key",
        "LIVEKIT_API_SECRET": "your-api-secret",
        "OPENAI_API_KEY": "your-openai-key"
      }
    }
  }
}

Restart Claude Desktop and you can now use voice commands!

Overview

voice-mcp provides Model Context Protocol (MCP) servers that allow LLMs to communicate via voice, enabling natural spoken conversations with AI assistants.

Architecture

┌─────────────────────┐     ┌──────────────────┐     ┌─────────────────────┐
│   Claude/LLM        │     │  LiveKit Server  │     │  Voice Frontend     │
│   (MCP Client)      │◄────►│  (Port 7880)     │◄────►│  (Port 3001)        │
└─────────────────────┘     └──────────────────┘     └─────────────────────┘
         │                            │
         │                            │
         ▼                            ▼
┌─────────────────────┐     ┌──────────────────┐
│  Voice MCP Server   │     │   Agent.py       │
│  (ask_voice_question│     │  (Voice Logic)   │
│   check_room_status)│     └──────────────────┘
└─────────────────────┘              │
                                     │
                    ┌────────────────┴────────────────┐
                    │                                 │
                    ▼                                 ▼
         ┌──────────────────┐             ┌──────────────────┐
         │  Whisper.cpp     │             │  Kokoro TTS      │
         │  (Port 2022)     │             │  (Port 8880)     │
         │  Local STT       │             │  Local TTS       │
         └──────────────────┘             └──────────────────┘

Features

  • Voice Input/Output: Bidirectional voice communication through LiveKit
  • Speech-to-Text: Local whisper.cpp or OpenAI Whisper API
  • Text-to-Speech: Multiple TTS providers (OpenAI TTS + local Kokoro-FastAPI)
  • Local STT/TTS: Cost-free local speech recognition and voice generation
  • Real-time Streaming: Low-latency voice interactions
  • MCP Integration: Works seamlessly with Claude and other MCP-compatible clients

Installation Options

Option 1: Python Package (Recommended for Users)

# Install globally
pip install livekit-voice-mcp

# Or use without installation
uvx livekit-voice-mcp

# Or use pipx for isolated installation  
pipx install livekit-voice-mcp

Option 2: Container Image

# Pull and run the container
docker pull ghcr.io/mbailey/voice-mcp:latest

# Run with environment variables
docker run -e OPENAI_API_KEY=your_key_here \
  -e VOICE_MCP_DEBUG=true \
  ghcr.io/mbailey/voice-mcp:latest

See CONTAINER.md for detailed container usage instructions.

Option 3: Local Development Setup

# Clone the repository
git clone https://github.com/mbailey/voice-mcp-public.git
cd voice-mcp-public

# Build container image
make build-container

# Or install development environment
make install

Configuration

Python Package Configuration

Set environment variables before running:

export LIVEKIT_URL="wss://your-app.livekit.cloud"
export LIVEKIT_API_KEY="your-api-key"
export LIVEKIT_API_SECRET="your-api-secret"
export OPENAI_API_KEY="your-openai-key"  # For STT/TTS

Local Development Configuration

Copy the example configuration and customize:

cp .env.example .env.local
# Edit .env.local with your settings

Provider Selection

voice-mcp supports multiple STT/TTS providers with smart fallback:

TTS Providers

  • TTS_PROVIDER=auto (default): Try Kokoro → OpenAI → LiveKit
  • TTS_PROVIDER=kokoro: Use only local Kokoro TTS
  • TTS_PROVIDER=openai: Use only OpenAI TTS

STT Configuration

  • Local Whisper: Automatically used when available at http://localhost:2022
  • OpenAI Whisper: Fallback when local whisper is not running

Key Configuration Options

# TTS Provider (auto/kokoro/openai)
TTS_PROVIDER=auto

# Kokoro TTS (local)
KOKORO_URL=http://127.0.0.1:8880
KOKORO_ENABLED=true

# Whisper STT (local)
WHISPER_BASE_URL=http://localhost:2022

# OpenAI (fallback for both STT and TTS)
OPENAI_API_KEY=your_key_here

# LiveKit
LIVEKIT_URL=ws://localhost:7880

Usage

Using the Python Package

Once installed and configured in Claude Desktop, you can use voice commands:

  1. Ask Claude: "Can you help me with voice?"
  2. Claude will use the voice MCP tools to communicate
  3. Speak your questions and hear responses

Available MCP tools:

  • ask_voice_question: Ask a question via voice and get a text response
  • check_room_status: Check active voice rooms and participants

Local Development Usage

  1. Download external repositories:

    mt sync
    
  2. Install and build all dependencies:

    make install
    
  3. Start the development environment:

    make dev
    

This will start:

  • LiveKit server (port 7880)
  • Kokoro TTS (port 8880)
  • Whisper STT (port 2022)
  • Voice assistant frontend (port 3001)

Individual components:

make livekit-server   # Start LiveKit server
make frontend         # Start voice frontend
make kokoro-start     # Start Kokoro TTS
make whisper-start    # Start Whisper STT

Architecture

  • livekit-voice-mcp: MCP server for voice interactions
  • livekit-admin-mcp: Administrative tools for LiveKit management
  • livekit-agent: Python agent handling voice processing
  • kokoro-fastapi: Local TTS server providing OpenAI-compatible API
  • whisper.cpp: Local STT server providing OpenAI-compatible API

Kokoro-FastAPI (Local TTS)

voice-mcp includes Kokoro-FastAPI for cost-free local text-to-speech generation:

  • 70+ Voice Options: Multiple languages and voice styles
  • OpenAI Compatible: Drop-in replacement for OpenAI TTS API
  • Web Interface: Interactive voice testing at http://127.0.0.1:8880/web/
  • Browser Support: Chrome/Chromium recommended (Firefox has streaming limitations)

Kokoro Commands

make kokoro-start     # Start Kokoro TTS service
make kokoro-stop      # Stop Kokoro TTS service  
make kokoro-build     # Build Kokoro container
make test-kokoro      # Test Kokoro functionality

Quick Test

# Generate speech using Kokoro API
curl -X POST http://127.0.0.1:8880/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Kokoro!", "voice": "nova"}' \
  --output test.mp3

Whisper.cpp (Local STT)

voice-mcp includes whisper.cpp for cost-free local speech-to-text:

  • Hardware Optimization: Automatically selects best model for your hardware
  • OpenAI Compatible: Drop-in replacement for OpenAI Whisper API
  • Multiple Models: From tiny to large-v3-turbo
  • GPU Support: CUDA, Metal, and Vulkan acceleration

Whisper Commands

make whisper-build    # Build Whisper container
make whisper-start    # Start Whisper STT service
make whisper-stop     # Stop Whisper STT service

Quick Test

# Test whisper API (OpenAI-compatible)
curl -X POST http://localhost:2022/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav"

Requirements

  • Python 3.8+
  • LiveKit server
  • Podman or Docker (for Kokoro TTS only)
  • Build tools (cmake, make, gcc/g++) for Whisper.cpp
  • OpenAI API key (optional, for cloud fallback)
  • mt command for managing external repos

Development

See TASKS.md for development roadmap and technical tasks.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_mcp-0.1.2.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_mcp-0.1.2-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file voice_mcp-0.1.2.tar.gz.

File metadata

  • Download URL: voice_mcp-0.1.2.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for voice_mcp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 39e7014f27f478503b26fe9b80e0ea418471ef90000d560928e52430a074a3b2
MD5 4c5135a09cbee1d9254ea1646ac1802f
BLAKE2b-256 fe4a75494aea0649e5368a54ce9ca1674132a438908722203f9a01f48fdda676

See more details on using hashes here.

File details

Details for the file voice_mcp-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: voice_mcp-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for voice_mcp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1695ade97ee94b596ea0cc5dc98abbdb01ac5c90f2c6e4931562f1be3eda7dbe
MD5 50f639c4e0ebb56171d12efe28bc71d0
BLAKE2b-256 dcff13d715eebf5798d2b075ee7631a2b15ca4cc168d27d243e2c0daaf43fd2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page