Speech MCP Server with command-line interface

Project description

Speech MCP

A Goose MCP extension for voice interaction with audio visualization.

Overview

Speech MCP provides a voice interface for Goose, allowing users to interact through speech rather than text. It includes:

Real-time audio processing for speech recognition
Local speech-to-text using OpenAI's Whisper model
Text-to-speech capabilities
Simple command-line interface for voice interaction

Features

Voice Input: Capture and transcribe user speech using Whisper
Voice Output: Convert agent responses to speech
Continuous Conversation: Automatically listen for user input after agent responses
Silence Detection: Automatically stops recording when the user stops speaking

Installation

Option 1: Quick Install (One-Click)

Click the link below if you have Goose installed:

goose://extension?cmd=uvx&arg=speech-mcp&id=speech_mcp&name=Speech%20Interface&description=Voice%20interaction%20with%20audio%20visualization%20for%20Goose

Option 2: Using Goose CLI (recommended)

Start Goose with your extension enabled:

# If you installed via PyPI
goose session --with-extension "uvx speech-mcp"

# Or if you want to use a local development version
goose session --with-extension "python -m speech_mcp"

Option 3: Manual setup in Goose

Run goose configure
Select "Add Extension" from the menu
Choose "Command-line Extension"
Enter a name (e.g., "Speech Interface")
For the command, enter: uvx speech-mcp
Follow the prompts to complete the setup

Option 4: Manual Installation

Clone this repository
Install dependencies:
```
pip install -e .
```

Dependencies

Python 3.10+
PyAudio (for audio capture)
OpenAI Whisper (for speech-to-text)
NumPy (for audio processing)
Pydub (for audio processing)

Usage

To use this MCP with Goose, you can:

Start the voice mode:
```
start_voice_mode()
```
Listen for user input:
```
transcript = listen()
```
Respond with speech:
```
speak("Your response text")
```
Get the current state:
```
get_speech_state()
```

Typical Workflow

# Start the voice interface
start_voice_mode()

# Listen for user input
transcript = listen()

# Process the transcript and generate a response
# ...

# Speak the response
speak("Here is my response")

# Automatically listen again
transcript = listen()

Technical Details

Speech-to-Text

The MCP uses OpenAI's Whisper model for speech recognition:

Uses the "base" model for a good balance of accuracy and speed
Processes audio locally without sending data to external services
Automatically detects when the user has finished speaking

License

MIT License

Project details

Release history Release notifications | RSS feed

1.1.1

Mar 20, 2025

1.0.10

Mar 10, 2025

1.0.1

Mar 7, 2025

1.0.0

Mar 7, 2025

0.4.6

Mar 5, 2025

0.4.5

Mar 5, 2025

0.4.4

Mar 5, 2025

0.4.2

Mar 4, 2025

0.4.1

Mar 4, 2025

0.4.0

Mar 4, 2025

0.3.0

Mar 4, 2025

0.2.1

Mar 4, 2025

This version

0.1.0

Mar 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_mcp-0.1.0.tar.gz (135.6 kB view details)

Uploaded Mar 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_mcp-0.1.0-py3-none-any.whl (15.9 kB view details)

Uploaded Mar 3, 2025 Python 3

File details

Details for the file speech_mcp-0.1.0.tar.gz.

File metadata

Download URL: speech_mcp-0.1.0.tar.gz
Upload date: Mar 3, 2025
Size: 135.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7ccba6e2ab3e2d08dfdd05b0aed8bf556aaad66ad3a3411563991bace46a3ff5`
MD5	`4fb890bc5b0ab18cda040e2e5daf3295`
BLAKE2b-256	`4c417a57eac8a54afb56a0daa6a984beac8799fa1948317e82fcebc56c985ff1`

See more details on using hashes here.

File details

Details for the file speech_mcp-0.1.0-py3-none-any.whl.

File metadata

Download URL: speech_mcp-0.1.0-py3-none-any.whl
Upload date: Mar 3, 2025
Size: 15.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for speech_mcp-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b5e280466ab015d46d1a840c73a7d175731ab239e7e63f62afd306cdb479e8f`
MD5	`9c8609ed041e7a3e7cc14956a8a98767`
BLAKE2b-256	`3bfa0c8d9de39c7807aac34b7d80e945a7e8629582491c64d794ba836b214fc9`

See more details on using hashes here.

speech-mcp 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Speech MCP

Overview

Features

Installation

Option 1: Quick Install (One-Click)

Option 2: Using Goose CLI (recommended)

Option 3: Manual setup in Goose

Option 4: Manual Installation

Dependencies

Usage

Typical Workflow

Technical Details

Speech-to-Text

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes