Speech MCP Server with command-line interface
Project description
Speech MCP
A Goose MCP extension for voice interaction with audio visualization.
Overview
Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:
- Real-time audio processing for speech recognition
- Local speech-to-text using OpenAI's Whisper model
- Text-to-speech capabilities
- Simple command-line interface for voice interaction
Features
- Voice Input: Capture and transcribe user speech using Whisper
- Voice Output: Convert agent responses to speech
- Continuous Conversation: Automatically listen for user input after agent responses
- Silence Detection: Automatically stops recording when the user stops speaking
Installation
Option 1: Quick Install (One-Click)
Click the link below if you have Goose installed:
goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose
Option 2: Using Goose CLI (recommended)
Start Goose with your extension enabled:
# If you installed via PyPI
goose session --with-extension "uvx speech-mcp"
# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"
Option 3: Manual setup in Goose
- Run
goose configure - Select "Add Extension" from the menu
- Choose "Command-line Extension"
- Enter a name (e.g., "Speech Interface")
- For the command, enter:
uvx speech-mcp - Follow the prompts to complete the setup
Option 4: Manual Installation
- Clone this repository
- Install dependencies:
pip install -e .
Dependencies
- Python 3.10+
- PyAudio (for audio capture)
- OpenAI Whisper (for speech-to-text)
- NumPy (for audio processing)
- Pydub (for audio processing)
Usage
To use this MCP with Goose, you can:
-
Start the voice mode:
start_voice_mode() -
Listen for user input:
transcript = listen() -
Respond with speech:
speak("Your response text") -
Get the current state:
get_speech_state()
Typical Workflow
# Start the voice interface
start_voice_mode()
# Listen for user input
transcript = listen()
# Process the transcript and generate a response
# ...
# Speak the response
speak("Here is my response")
# Automatically listen again
transcript = listen()
Technical Details
Speech-to-Text
The MCP uses OpenAI's Whisper model for speech recognition:
- Uses the "base" model for a good balance of accuracy and speed
- Processes audio locally without sending data to external services
- Automatically detects when the user has finished speaking
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_mcp-0.1.0.tar.gz.
File metadata
- Download URL: speech_mcp-0.1.0.tar.gz
- Upload date:
- Size: 135.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ccba6e2ab3e2d08dfdd05b0aed8bf556aaad66ad3a3411563991bace46a3ff5
|
|
| MD5 |
4fb890bc5b0ab18cda040e2e5daf3295
|
|
| BLAKE2b-256 |
4c417a57eac8a54afb56a0daa6a984beac8799fa1948317e82fcebc56c985ff1
|
File details
Details for the file speech_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: speech_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b5e280466ab015d46d1a840c73a7d175731ab239e7e63f62afd306cdb479e8f
|
|
| MD5 |
9c8609ed041e7a3e7cc14956a8a98767
|
|
| BLAKE2b-256 |
3bfa0c8d9de39c7807aac34b7d80e945a7e8629582491c64d794ba836b214fc9
|