Speech MCP Server with command-line interface and Kokoro TTS support
Project description
Speech MCP
A Goose MCP extension for voice interaction with modern audio visualization.
Overview
Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:
- Real-time audio processing for speech recognition
- Local speech-to-text using faster-whisper (a faster implementation of OpenAI's Whisper model)
- High-quality text-to-speech with multiple voice options
- Modern PyQt-based UI with audio visualization
- Simple command-line interface for voice interaction
Features
- Modern UI: Sleek PyQt-based interface with audio visualization and dark theme
- Voice Input: Capture and transcribe user speech using faster-whisper
- Voice Output: Convert agent responses to speech with 54+ voice options
- Voice Persistence: Remembers your preferred voice between sessions
- Continuous Conversation: Automatically listen for user input after agent responses
- Silence Detection: Automatically stops recording when the user stops speaking
- Robust Error Handling: Graceful recovery from common failure modes
Installation
Option 1: Quick Install (One-Click)
Click the link below if you have Goose installed:
goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose
Option 2: Using Goose CLI (recommended)
Start Goose with your extension enabled:
# If you installed via PyPI
goose session --with-extension "speech-mcp"
# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"
Option 3: Manual setup in Goose
- Run
goose configure - Select "Add Extension" from the menu
- Choose "Command-line Extension"
- Enter a name (e.g., "Speech Interface")
- For the command, enter:
speech-mcp - Follow the prompts to complete the setup
Option 4: Manual Installation
- Clone this repository
- Install dependencies:
uv pip install -e .
Dependencies
- Python 3.10+
- PyQt5 (for modern UI)
- PyAudio (for audio capture)
- faster-whisper (for speech-to-text)
- NumPy (for audio processing)
- Pydub (for audio processing)
- psutil (for process management)
Optional Dependencies
- Kokoro TTS: For high-quality text-to-speech with multiple voices
- To install Kokoro, you can use pip with optional dependencies:
pip install speech-mcp[kokoro] # Basic Kokoro support with English pip install speech-mcp[ja] # Add Japanese support pip install speech-mcp[zh] # Add Chinese support pip install speech-mcp[all] # All languages and features
- Alternatively, run the installation script:
python scripts/install_kokoro.py - See Kokoro TTS Guide for more information
- To install Kokoro, you can use pip with optional dependencies:
Usage
To use this MCP with Goose, you can:
-
Start a conversation:
user_input = start_conversation()
-
Reply to the user and get their response:
user_response = reply("Your response text here")
Typical Workflow
# Start the conversation
user_input = start_conversation()
# Process the input and generate a response
# ...
# Reply to the user and get their response
follow_up = reply("Here's my response to your question.")
# Process the follow-up and reply again
reply("I understand your follow-up question. Here's my answer.")
UI Features
The new PyQt-based UI includes:
- Modern Dark Theme: Sleek, professional appearance
- Audio Visualization: Dynamic visualization of audio input
- Voice Selection: Choose from 54+ voice options
- Voice Persistence: Your voice preference is saved between sessions
- Animated Effects: Smooth animations and visual feedback
- Status Indicators: Clear indication of system state (ready, listening, processing)
Configuration
User preferences are stored in ~/.config/speech-mcp/config.json and include:
- Selected TTS voice
- TTS engine preference
- Voice speed
- Language code
- UI theme settings
You can also set preferences via environment variables, such as:
SPEECH_MCP_TTS_VOICE- Set your preferred voiceSPEECH_MCP_TTS_ENGINE- Set your preferred TTS engine
Troubleshooting
If you encounter issues with the extension freezing or not responding:
- Check the logs: Look at the log files in
src/speech_mcp/for detailed error messages. - Reset the state: If the extension seems stuck, try deleting
src/speech_mcp/speech_state.jsonor setting all states tofalse. - Use the direct command: Instead of
uv run speech-mcp, use the installed package withspeech-mcpdirectly. - Check audio devices: Ensure your microphone is properly configured and accessible to Python.
- Verify dependencies: Make sure all required dependencies are installed correctly.
Recent Improvements
- Complete PyQt UI Migration: Replaced the old Tkinter UI with a modern PyQt implementation
- Code Refactoring: Split UI code into multiple components for better maintainability
- Process Management: Improved process lifecycle management with automatic recovery
- Voice Persistence: Added configuration system for saving user preferences
- Enhanced Visualization: Added dynamic audio visualization with animations
- Expanded Voice Options: Updated to support 54 different voice models
- Improved Error Handling: Better recovery from common failure modes
Technical Details
Speech-to-Text
The MCP uses faster-whisper for speech recognition:
- Uses the "base" model for a good balance of accuracy and speed
- Processes audio locally without sending data to external services
- Automatically detects when the user has finished speaking
- Provides improved performance over the original Whisper implementation
Text-to-Speech
The MCP supports multiple text-to-speech engines:
Default: pyttsx3
- Uses system voices available on your computer
- Works out of the box without additional setup
- Limited voice quality and customization
Optional: Kokoro TTS
- High-quality neural text-to-speech with multiple voices
- Lightweight model (82M parameters) that runs efficiently on CPU
- Multiple voice styles and languages
- To install:
python scripts/install_kokoro.py
Note about Voice Models: The voice models are .pt files (PyTorch models) that are loaded by Kokoro. Each voice model is approximately 523 KB in size and is automatically downloaded when needed.
Voice Persistence: The selected voice is automatically saved to a configuration file (~/.config/speech-mcp/config.json) and will be remembered between sessions. This allows users to set their preferred voice once and have it used consistently.
Available Kokoro Voices
American Female Voices
- af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky
American Male Voices
- am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa
British Female Voices
- bf_alice, bf_emma, bf_isabella, bf_lily
British Male Voices
- bm_daniel, bm_fable, bm_george, bm_lewis
Other English Voices
- ef_dora, em_alex, em_santa
Other Languages
- French: ff_siwis
- Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
- Italian: if_sara, im_nicola
- Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
- Portuguese: pf_dora, pm_alex, pm_santa
- Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_mcp-1.0.10.tar.gz.
File metadata
- Download URL: speech_mcp-1.0.10.tar.gz
- Upload date:
- Size: 60.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95bbfe967d15b8c4d23b12950e16f162b3f17d9a124577b1a5cbc5470a59c4e9
|
|
| MD5 |
b609aee3c1d4f109a01fc3aeb46c4206
|
|
| BLAKE2b-256 |
c6d89f8d27724d7b889437fd50c692ab245546c06c8020ebdfd0f2631894aa1c
|
File details
Details for the file speech_mcp-1.0.10-py3-none-any.whl.
File metadata
- Download URL: speech_mcp-1.0.10-py3-none-any.whl
- Upload date:
- Size: 86.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43658d0798a4549b166f41a87d73a2783c8a2c737d9e2b78badd2f3e12257623
|
|
| MD5 |
4ab38a994c57383a6ed7dbae4e736dd9
|
|
| BLAKE2b-256 |
011905f0184bdadfa518865ff34f04ddbd692b72b836892f195ac7b609f3a6d1
|