Speech MCP Server with command-line interface and Kokoro TTS support

Project description

Speech MCP

A Goose MCP extension for voice interaction with modern audio visualization.

Overview

Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:

Real-time audio processing for speech recognition
Local speech-to-text using faster-whisper (a faster implementation of OpenAI's Whisper model)
High-quality text-to-speech with multiple voice options
Modern PyQt-based UI with audio visualization
Simple command-line interface for voice interaction

Features

Modern UI: Sleek PyQt-based interface with audio visualization and dark theme
Voice Input: Capture and transcribe user speech using faster-whisper
Voice Output: Convert agent responses to speech with 54+ voice options
Voice Persistence: Remembers your preferred voice between sessions
Continuous Conversation: Automatically listen for user input after agent responses
Silence Detection: Automatically stops recording when the user stops speaking
Robust Error Handling: Graceful recovery from common failure modes

Installation

Option 1: Quick Install (One-Click)

Click the link below if you have Goose installed:

goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (recommended)

Start Goose with your extension enabled:

# If you installed via PyPI
goose session --with-extension "speech-mcp"

# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual setup in Goose

Run goose configure
Select "Add Extension" from the menu
Choose "Command-line Extension"
Enter a name (e.g., "Speech Interface")
For the command, enter: speech-mcp
Follow the prompts to complete the setup

Option 4: Manual Installation

Clone this repository
Install dependencies:
```
uv pip install -e .
```

Dependencies

Python 3.10+
PyQt5 (for modern UI)
PyAudio (for audio capture)
faster-whisper (for speech-to-text)
NumPy (for audio processing)
Pydub (for audio processing)
psutil (for process management)

Optional Dependencies

Kokoro TTS: For high-quality text-to-speech with multiple voices

To install Kokoro, you can use pip with optional dependencies:

pip install speech-mcp[kokoro]     # Basic Kokoro support with English
pip install speech-mcp[ja]         # Add Japanese support
pip install speech-mcp[zh]         # Add Chinese support
pip install speech-mcp[all]        # All languages and features

Alternatively, run the installation script: python scripts/install_kokoro.py
See Kokoro TTS Guide for more information

Usage

To use this MCP with Goose, you can:

Start a conversation:
```
user_input = start_conversation()
```

Reply to the user and get their response:

user_response = reply("Your response text here")

Typical Workflow

# Start the conversation
user_input = start_conversation()

# Process the input and generate a response
# ...

# Reply to the user and get their response
follow_up = reply("Here's my response to your question.")

# Process the follow-up and reply again
reply("I understand your follow-up question. Here's my answer.")

UI Features

The new PyQt-based UI includes:

Modern Dark Theme: Sleek, professional appearance
Audio Visualization: Dynamic visualization of audio input
Voice Selection: Choose from 54+ voice options
Voice Persistence: Your voice preference is saved between sessions
Animated Effects: Smooth animations and visual feedback
Status Indicators: Clear indication of system state (ready, listening, processing)

Configuration

User preferences are stored in ~/.config/speech-mcp/config.json and include:

Selected TTS voice
TTS engine preference
Voice speed
Language code
UI theme settings

You can also set preferences via environment variables, such as:

SPEECH_MCP_TTS_VOICE - Set your preferred voice
SPEECH_MCP_TTS_ENGINE - Set your preferred TTS engine

Troubleshooting

If you encounter issues with the extension freezing or not responding:

Check the logs: Look at the log files in src/speech_mcp/ for detailed error messages.
Reset the state: If the extension seems stuck, try deleting src/speech_mcp/speech_state.json or setting all states to false.
Use the direct command: Instead of uv run speech-mcp, use the installed package with speech-mcp directly.
Check audio devices: Ensure your microphone is properly configured and accessible to Python.
Verify dependencies: Make sure all required dependencies are installed correctly.

Recent Improvements

Complete PyQt UI Migration: Replaced the old Tkinter UI with a modern PyQt implementation
Code Refactoring: Split UI code into multiple components for better maintainability
Process Management: Improved process lifecycle management with automatic recovery
Voice Persistence: Added configuration system for saving user preferences
Enhanced Visualization: Added dynamic audio visualization with animations
Expanded Voice Options: Updated to support 54 different voice models
Improved Error Handling: Better recovery from common failure modes

Technical Details

Speech-to-Text

The MCP uses faster-whisper for speech recognition:

Uses the "base" model for a good balance of accuracy and speed
Processes audio locally without sending data to external services
Automatically detects when the user has finished speaking
Provides improved performance over the original Whisper implementation

Text-to-Speech

The MCP supports multiple text-to-speech engines:

Default: pyttsx3

Uses system voices available on your computer
Works out of the box without additional setup
Limited voice quality and customization

Optional: Kokoro TTS

High-quality neural text-to-speech with multiple voices
Lightweight model (82M parameters) that runs efficiently on CPU
Multiple voice styles and languages
To install: python scripts/install_kokoro.py

Note about Voice Models: The voice models are .pt files (PyTorch models) that are loaded by Kokoro. Each voice model is approximately 523 KB in size and is automatically downloaded when needed.

Voice Persistence: The selected voice is automatically saved to a configuration file (~/.config/speech-mcp/config.json) and will be remembered between sessions. This allows users to set their preferred voice once and have it used consistently.

Available Kokoro Voices

American Female Voices

af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky

American Male Voices

am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa

British Female Voices

bf_alice, bf_emma, bf_isabella, bf_lily

British Male Voices

bm_daniel, bm_fable, bm_george, bm_lewis

Other English Voices

ef_dora, em_alex, em_santa

Other Languages

French: ff_siwis
Hindi: hf_alpha, hf_beta, hm_omega, hm_psi
Italian: if_sara, im_nicola
Japanese: jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo
Portuguese: pf_dora, pm_alex, pm_santa
Chinese: zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang

License

MIT License

Project details

Release history Release notifications | RSS feed

1.1.1

Mar 20, 2025

This version

1.0.10

Mar 10, 2025

1.0.1

Mar 7, 2025

1.0.0

Mar 7, 2025

0.4.6

Mar 5, 2025

0.4.5

Mar 5, 2025

0.4.4

Mar 5, 2025

0.4.2

Mar 4, 2025

0.4.1

Mar 4, 2025

0.4.0

Mar 4, 2025

0.3.0

Mar 4, 2025

0.2.1

Mar 4, 2025

0.1.0

Mar 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_mcp-1.0.10.tar.gz (60.9 kB view details)

Uploaded Mar 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_mcp-1.0.10-py3-none-any.whl (86.2 kB view details)

Uploaded Mar 10, 2025 Python 3

File details

Details for the file speech_mcp-1.0.10.tar.gz.

File metadata

Download URL: speech_mcp-1.0.10.tar.gz
Upload date: Mar 10, 2025
Size: 60.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-1.0.10.tar.gz
Algorithm	Hash digest
SHA256	`95bbfe967d15b8c4d23b12950e16f162b3f17d9a124577b1a5cbc5470a59c4e9`
MD5	`b609aee3c1d4f109a01fc3aeb46c4206`
BLAKE2b-256	`c6d89f8d27724d7b889437fd50c692ab245546c06c8020ebdfd0f2631894aa1c`

See more details on using hashes here.

File details

Details for the file speech_mcp-1.0.10-py3-none-any.whl.

File metadata

Download URL: speech_mcp-1.0.10-py3-none-any.whl
Upload date: Mar 10, 2025
Size: 86.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-1.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`43658d0798a4549b166f41a87d73a2783c8a2c737d9e2b78badd2f3e12257623`
MD5	`4ab38a994c57383a6ed7dbae4e736dd9`
BLAKE2b-256	`011905f0184bdadfa518865ff34f04ddbd692b72b836892f195ac7b609f3a6d1`

See more details on using hashes here.

speech-mcp 1.0.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Speech MCP

Overview

Features

Installation

Option 1: Quick Install (One-Click)

Option 2: Using Goose CLI (recommended)

Option 3: Manual setup in Goose

Option 4: Manual Installation

Dependencies

Optional Dependencies

Usage

Typical Workflow

UI Features

Configuration

Troubleshooting

Recent Improvements

Technical Details

Speech-to-Text

Text-to-Speech

Default: pyttsx3

Optional: Kokoro TTS

Available Kokoro Voices

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes