Skip to main content

Speech MCP Server with command-line interface

Project description

Speech MCP

A Goose MCP extension for voice interaction with audio visualization.

Overview

Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:

  • Real-time audio processing for speech recognition
  • Local speech-to-text using faster-whisper (a faster implementation of OpenAI's Whisper model)
  • Text-to-speech capabilities
  • Simple command-line interface for voice interaction

Features

  • Voice Input: Capture and transcribe user speech using faster-whisper
  • Voice Output: Convert agent responses to speech
  • Continuous Conversation: Automatically listen for user input after agent responses
  • Silence Detection: Automatically stops recording when the user stops speaking
  • Robust Error Handling: Graceful recovery from common failure modes

Installation

Option 1: Quick Install (One-Click)

Click the link below if you have Goose installed:

goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (recommended)

Start Goose with your extension enabled:

# If you installed via PyPI
goose session --with-extension "speech-mcp"

# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual setup in Goose

  1. Run goose configure
  2. Select "Add Extension" from the menu
  3. Choose "Command-line Extension"
  4. Enter a name (e.g., "Speech Interface")
  5. For the command, enter: speech-mcp
  6. Follow the prompts to complete the setup

Option 4: Manual Installation

  1. Clone this repository
  2. Install dependencies:
    uv pip install -e .
    

Dependencies

  • Python 3.10+
  • PyAudio (for audio capture)
  • faster-whisper (for speech-to-text)
  • NumPy (for audio processing)
  • Pydub (for audio processing)
  • pyttsx3 (for text-to-speech)
  • psutil (for process management)

Usage

To use this MCP with Goose, you can:

  1. Start a conversation:

    user_input = start_conversation()
    
  2. Reply to the user and get their response:

    user_response = reply("Your response text here")
    

Typical Workflow

# Start the conversation
user_input = start_conversation()

# Process the input and generate a response
# ...

# Reply to the user and get their response
follow_up = reply("Here's my response to your question.")

# Process the follow-up and reply again
reply("I understand your follow-up question. Here's my answer.")

Troubleshooting

If you encounter issues with the extension freezing or not responding:

  1. Check the logs: Look at the log files in src/speech_mcp/ for detailed error messages.
  2. Reset the state: If the extension seems stuck, try deleting src/speech_mcp/speech_state.json or setting all states to false.
  3. Use the direct command: Instead of uv run speech-mcp, use the installed package with speech-mcp directly.
  4. Check audio devices: Ensure your microphone is properly configured and accessible to Python.
  5. Verify dependencies: Make sure all required dependencies are installed correctly.

Recent Fixes

  • Improved error handling: Better recovery from common failure modes
  • Timeout management: Reduced timeouts and added fallback mechanisms
  • Process management: Better handling of UI process startup and termination
  • State consistency: Added state reset mechanisms to avoid getting stuck
  • Fallback transcription: Added emergency transcription when UI process fails
  • Debugging output: Enhanced logging and console output for troubleshooting

Technical Details

Speech-to-Text

The MCP uses faster-whisper for speech recognition:

  • Uses the "base" model for a good balance of accuracy and speed
  • Processes audio locally without sending data to external services
  • Automatically detects when the user has finished speaking
  • Provides improved performance over the original Whisper implementation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_mcp-0.2.1.tar.gz (38.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_mcp-0.2.1-py3-none-any.whl (56.0 kB view details)

Uploaded Python 3

File details

Details for the file speech_mcp-0.2.1.tar.gz.

File metadata

  • Download URL: speech_mcp-0.2.1.tar.gz
  • Upload date:
  • Size: 38.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 92a4682d6d391e3c1ab61994ffbf56560349ac2f92f0f05ed8af6a063f880567
MD5 07de9ada7692e940f0a91c9d8ed12355
BLAKE2b-256 aac55df5a0dfa9e29dd20bad27b03cbe8f7696863e2aa8551404c235627d6c38

See more details on using hashes here.

File details

Details for the file speech_mcp-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: speech_mcp-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 56.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a09380bd841047c3b4901ba3ef9d966149b447fbbcf58876430705cad30a4cac
MD5 806c109f1f56eff37221a0ab50c7dfba
BLAKE2b-256 b0c0ac0d3b1d28773335ac723ece5759094fc721543b7a7c1bffe9442fc25cbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page