Skip to main content

Speech MCP Server with command-line interface

Project description

Speech MCP

A Goose MCP extension for voice interaction with audio visualization.

Overview

Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:

  • Real-time audio processing for speech recognition
  • Local speech-to-text using OpenAI's Whisper model
  • Text-to-speech capabilities
  • Simple command-line interface for voice interaction

Features

  • Voice Input: Capture and transcribe user speech using Whisper
  • Voice Output: Convert agent responses to speech
  • Continuous Conversation: Automatically listen for user input after agent responses
  • Silence Detection: Automatically stops recording when the user stops speaking

Installation

Option 1: Quick Install (One-Click)

Click the link below if you have Goose installed:

goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (recommended)

Start Goose with your extension enabled:

# If you installed via PyPI
goose session --with-extension "uvx speech-mcp"

# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual setup in Goose

  1. Run goose configure
  2. Select "Add Extension" from the menu
  3. Choose "Command-line Extension"
  4. Enter a name (e.g., "Speech Interface")
  5. For the command, enter: uvx speech-mcp
  6. Follow the prompts to complete the setup

Option 4: Manual Installation

  1. Clone this repository
  2. Install dependencies:
    pip install -e .
    

Dependencies

  • Python 3.10+
  • PyAudio (for audio capture)
  • OpenAI Whisper (for speech-to-text)
  • NumPy (for audio processing)
  • Pydub (for audio processing)

Usage

To use this MCP with Goose, you can:

  1. Start the voice mode:

    start_voice_mode()
    
  2. Listen for user input:

    transcript = listen()
    
  3. Respond with speech:

    speak("Your response text")
    
  4. Get the current state:

    get_speech_state()
    

Typical Workflow

# Start the voice interface
start_voice_mode()

# Listen for user input
transcript = listen()

# Process the transcript and generate a response
# ...

# Speak the response
speak("Here is my response")

# Automatically listen again
transcript = listen()

Technical Details

Speech-to-Text

The MCP uses OpenAI's Whisper model for speech recognition:

  • Uses the "base" model for a good balance of accuracy and speed
  • Processes audio locally without sending data to external services
  • Automatically detects when the user has finished speaking

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_mcp-0.1.0.tar.gz (135.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_mcp-0.1.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file speech_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: speech_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 135.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7ccba6e2ab3e2d08dfdd05b0aed8bf556aaad66ad3a3411563991bace46a3ff5
MD5 4fb890bc5b0ab18cda040e2e5daf3295
BLAKE2b-256 4c417a57eac8a54afb56a0daa6a984beac8799fa1948317e82fcebc56c985ff1

See more details on using hashes here.

File details

Details for the file speech_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: speech_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b5e280466ab015d46d1a840c73a7d175731ab239e7e63f62afd306cdb479e8f
MD5 9c8609ed041e7a3e7cc14956a8a98767
BLAKE2b-256 3bfa0c8d9de39c7807aac34b7d80e945a7e8629582491c64d794ba836b214fc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page