Skip to main content

An MCP server for transcribing audio files using OpenAI Whisper and GPT-4o

Project description

MCP Server Whisper

A Model Context Protocol (MCP) server for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.

License: MIT Python 3.10+ CI Status Built with uv

Overview

MCP Server Whisper provides a standardized way to process audio files through OpenAI's transcription services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities.

Key features:

  • 🔍 Advanced file searching with regex patterns, file metadata filtering, and sorting capabilities
  • 🔄 Parallel batch processing for multiple audio files
  • 🔄 Format conversion between supported audio types
  • 📦 Automatic compression for oversized files
  • ✏️ Enhanced transcription with specialized prompts
  • 📊 Comprehensive metadata including duration, file size, and format support
  • 🚀 High-performance caching for repeated operations

Installation

# Clone the repository
git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper

# Using uv (recommended)
uv sync -e .
uv sync -e ".[dev]"  # For development dependencies

# Using pip
pip install -e .
pip install -e ".[dev]"  # For development dependencies

# Set up pre-commit hooks
pre-commit install

Environment Setup

Create a .env file with the following variables:

OPENAI_API_KEY=your_openai_api_key
AUDIO_FILES_PATH=/path/to/your/audio/files

Usage

Starting the Server

To run the MCP server in development mode:

mcp dev src/mcp_server_whisper/server.py

To install the server for use with Claude Desktop or other MCP clients:

mcp install src/mcp_server_whisper/server.py

Exposed MCP Tools

Audio File Management

  • list_audio_files - Lists audio files with comprehensive filtering and sorting options:
    • Filter by regex pattern matching on filenames
    • Filter by file size, duration, modification time, or format
    • Sort by name, size, duration, modification time, or format
    • All operations support parallelized batch processing
  • get_latest_audio - Gets the most recently modified audio file with model support info

Audio Processing

  • convert_audio - Converts audio files to supported formats (mp3 or wav)
  • compress_audio - Compresses audio files that exceed size limits

Transcription

  • transcribe_audio - Basic transcription using OpenAI's Whisper model
  • transcribe_with_llm - Transcription with custom prompts using GPT-4o
  • transcribe_with_enhancement - Enhanced transcription with specialized templates:
    • detailed - Includes tone, emotion, and background details
    • storytelling - Transforms the transcript into a narrative form
    • professional - Creates formal, business-appropriate transcriptions
    • analytical - Adds analysis of speech patterns and key points

Supported Audio Formats

Model Supported Formats
Whisper mp3, mp4, mpeg, mpga, m4a, wav, webm
GPT-4o mp3, wav

Note: Files larger than 25MB are automatically compressed to meet API limits.

Example Usage with Claude

Basic Audio Transcription
Claude, please transcribe my latest audio file with detailed insights.

Claude will automatically:

  1. Find the latest audio file using get_latest_audio
  2. Determine the appropriate transcription method
  3. Process the file with transcribe_with_enhancement using the "detailed" template
  4. Return the enhanced transcription
Advanced Audio File Search and Filtering
Claude, list all my audio files that are longer than 5 minutes and were created after January 1st, 2024, sorted by size.

Claude will:

  1. Convert the date to a timestamp
  2. Use list_audio_files with appropriate filters:
    • min_duration_seconds: 300 (5 minutes)
    • min_modified_time: <timestamp for Jan 1, 2024>
    • sort_by: "size"
  3. Return a sorted list of matching audio files with comprehensive metadata
Batch Processing Multiple Files
Claude, find all MP3 files with "interview" in the filename and create professional transcripts for each one.

Claude will:

  1. Search for files using list_audio_files with:
    • pattern: ".*interview.*\\.mp3"
    • format: "mp3"
  2. Process all matching files in parallel using transcribe_with_enhancement
    • enhancement_type: "professional"
  3. Return all transcriptions in a well-formatted output

Development

This project uses modern Python development tools including uv, pytest, ruff, and mypy.

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=src

# Format code
uv run ruff format src

# Lint code
uv run ruff check src

# Run type checking (strict mode)
uv run mypy --strict src

# Run the pre-commit hooks
pre-commit run --all-files

CI/CD Workflow

The project uses GitHub Actions for CI/CD:

  1. Lint & Type Check: Ensures code quality with ruff and strict mypy type checking
  2. Tests: Runs tests on multiple Python versions (3.10, 3.11)
  3. Build: Creates distribution packages
  4. Publish: Automatically publishes to PyPI when a new version tag is pushed

To create a new release version:

git checkout main
# Make sure everything is up to date
git pull
# Create a new version tag
git tag v0.1.1
# Push the tag
git push origin v0.1.1

How It Works

For detailed architecture information, see Architecture Documentation.

MCP Server Whisper is built on the Model Context Protocol, which standardizes how AI models interact with external tools and data sources. The server:

  1. Exposes Audio Processing Capabilities: Through standardized MCP tool interfaces
  2. Implements Parallel Processing: Using asyncio and batch operations for performance
  3. Manages File Operations: Handles detection, validation, conversion, and compression
  4. Provides Rich Transcription: Via different OpenAI models and enhancement templates
  5. Optimizes Performance: With caching mechanisms for repeated operations

Under the hood, it uses:

  • pydub for audio file manipulation
  • asyncio for concurrent processing
  • OpenAI's Whisper API for base transcription
  • GPT-4o for enhanced audio understanding
  • FastMCP for simplified MCP server implementation
  • Type hints and strict mypy validation throughout the codebase

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch for your feature (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run the tests and linting (uv run pytest && uv run ruff check src && uv run mypy --strict src)
  5. Commit your changes (git commit -m 'Add some amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


Made with ❤️ by Richie Caputo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_server_whisper-0.1.2.tar.gz (91.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_server_whisper-0.1.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file mcp_server_whisper-0.1.2.tar.gz.

File metadata

  • Download URL: mcp_server_whisper-0.1.2.tar.gz
  • Upload date:
  • Size: 91.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.13

File hashes

Hashes for mcp_server_whisper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2968a290a9896167a556fc98692466dd9c84c48e0b6cf2cf0a829414d39d8e7f
MD5 35be34c40c7a7c4dfa5b442131b7514e
BLAKE2b-256 ca77ddf31d3a309ad1ea747167f7de2428327cdd3176e6516b8d75fc9f3d4127

See more details on using hashes here.

File details

Details for the file mcp_server_whisper-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_server_whisper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 56175d26a4035ed5cd09756fed1f3a883c37f776a8dfe5728973cf3155a34acf
MD5 e876d3937b5db81b3d13abcb504559a4
BLAKE2b-256 f6e3904747d7c741dbcdd03f29afe757ace789e74ad97f32857e22ef9c3ed4f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page