Skip to main content

MCP server for transcribing Telegram voice messages using Whisper, with Kokoro/OpenAI TTS voice replies

Project description

whisper-telegram-mcp

Transcribe and speak — two-way voice for Claude via Telegram

CI Python License: MIT MCP Ko-fi

An MCP server that gives Claude two-way voice capabilities via Telegram: transcribe incoming voice messages with Whisper, and reply with synthesized speech. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

What It Does

  • Transcribe local audio files -- OGG, WAV, MP3, FLAC, and more
  • Transcribe Telegram voice messages -- pass a file_id, get text back
  • Speak text as voice notes -- synthesise speech and send back as OGG (plays as a voice note in Telegram)
  • Two transcription backends -- local faster-whisper (free, private) or OpenAI Whisper API (cloud)
  • Auto mode -- tries local first, falls back to OpenAI if it fails
  • Language detection -- automatic or specify an ISO-639-1 code
  • Word-level timestamps -- optional fine-grained timing

Prerequisites

Feature Requirement
Transcription (local) None — faster-whisper bundled via [local] extras
Transcription (cloud) OPENAI_API_KEY env var
Voice replies — Kokoro (best quality) Docker — run docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest
Voice replies — OpenAI TTS (fallback) OPENAI_API_KEY env var
Voice replies — macOS say (last resort) Mac only, no setup

Kokoro requires Docker. If Docker isn't running, voice replies fall back to OpenAI TTS or macOS say automatically.

Quick Start

One command with uvx

uvx whisper-telegram-mcp

No installation needed -- uvx handles everything.

Or install with pip

pip install whisper-telegram-mcp
whisper-telegram-mcp

Telegram Bot Setup

  1. Open Telegram and message @BotFather
  2. Send /newbot and follow the prompts to create a bot
  3. Copy the token (looks like 1234567890:ABCdef...)
  4. Add TELEGRAM_BOT_TOKEN to your MCP config env (see below)
  5. Message your bot to start — it'll only respond to approved users

The Claude Telegram plugin handles access control. See its docs for pairing/allowlist setup.

Integration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto",
        "TELEGRAM_BOT_TOKEN": "your-bot-token-here"
      }
    }
  }
}

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto",
        "TELEGRAM_BOT_TOKEN": "your-bot-token-here"
      }
    }
  }
}

Tools

Tool Description
transcribe_audio Transcribe a local audio file (OGG, WAV, MP3, etc.) to text
transcribe_telegram_voice Download and transcribe a Telegram voice message by file_id
speak_text Convert text to speech → OGG/Opus file (plays as voice note in Telegram)
list_models List available Whisper model sizes with speed/accuracy info
check_backends Check which backends (local/OpenAI) are available and configured

transcribe_audio

file_path: str        # Absolute path to audio file
language: str | None  # ISO-639-1 code (e.g. "en"), None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

transcribe_telegram_voice

file_id: str          # Telegram voice message file_id
bot_token: str | None # Bot token (falls back to TELEGRAM_BOT_TOKEN env var)
language: str | None  # ISO-639-1 code, None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

speak_text

Converts text to an OGG/Opus audio file. Automatically selects the best available TTS backend.

text: str             # Text to synthesise
voice: str            # Voice name (default: "af_sky")
output_path: str|None # Optional path for output .ogg file

TTS Backends (in priority order):

Backend Cost Quality Setup
Kokoro (local) Free Natural, high quality Start manually (see below)
OpenAI TTS (cloud) ~$0.015/1k chars High quality OPENAI_API_KEY env var
macOS say (fallback) Free Robotic Mac only, no setup

In auto mode (default), the server tries Kokoro first, then OpenAI, then macOS say. Configure with TTS_BACKEND env var.

Starting Kokoro locally:

Kokoro FastAPI is not on PyPI — start it before running the MCP server:

# Docker (simplest, recommended)
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest

# Apple Silicon (GPU-accelerated)
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu-mac:latest

# From source
git clone https://github.com/remsky/Kokoro-FastAPI && cd Kokoro-FastAPI && ./start-cpu.sh

Once running, the MCP server auto-detects it at http://127.0.0.1:8880/v1. Override with KOKORO_BASE_URL env var.

Kokoro voices (primary):

Voice Accent Style
af_sky US Female (default)
af_bella US Female
af_sarah US Female
af_nicole US Female
am_adam US Male
am_michael US Male
bf_emma UK Female
bf_isabella UK Female
bm_george UK Male
bm_lewis UK Male

OpenAI voices (fallback):

Voice Style
alloy Neutral
echo Male
fable Narrative
onyx Deep male
nova Female
shimmer Soft female

Kokoro voice names are automatically mapped to the closest OpenAI or macOS equivalent when falling back.

Returns:

{
  "file_path": "/tmp/tmpXXX.ogg",
  "size_bytes": 16555,
  "backend": "kokoro",
  "voice": "af_sky",
  "success": true,
  "error": null
}

Send the returned file_path as a Telegram attachment and it will appear as a native voice note.

Transcription response format

All transcription tools return:

{
  "text": "Hello, this is a voice message.",
  "language": "en",
  "language_probability": 0.98,
  "duration": 3.5,
  "segments": [
    {"start": 0.0, "end": 3.5, "text": "Hello, this is a voice message."}
  ],
  "backend": "local",
  "success": true,
  "error": null
}

Configuration

All configuration is via environment variables:

Variable Default Description
WHISPER_BACKEND auto auto, local, or openai
WHISPER_MODEL base Whisper model size (see below)
OPENAI_API_KEY -- Required for openai transcription and TTS backends
TELEGRAM_BOT_TOKEN -- Required for transcribe_telegram_voice
WHISPER_LANGUAGE auto-detect ISO-639-1 language code
TTS_BACKEND auto auto, kokoro, openai, or macos
TTS_VOICE af_sky Default voice for speak_text (Kokoro voice name)
KOKORO_BASE_URL http://127.0.0.1:8880/v1 Kokoro FastAPI base URL

How It Works

                         MCP Client (Claude)
                              |
                         [MCP stdio]
                              |
                    whisper-telegram-mcp
                    /         |         \
                   /          |          \
      transcribe_audio  transcribe_     speak_text
                        telegram_voice      |
              |               |          auto_tts()
              |         [Bot API DL]    /    |    \
              +--------+------+     Kokoro OpenAI macOS
                       |            (local) (cloud) (say)
                 auto_transcribe()      |
                  /           \      .ogg file
           LocalBackend    OpenAIBackend
           (faster-whisper)  (Whisper API)
  1. Claude sends a tool call via MCP (stdio transport)
  2. For Telegram voice messages, the file is downloaded via Bot API
  3. auto_transcribe() picks the best available transcription backend
  4. auto_tts() picks the best available TTS backend (Kokoro -> OpenAI -> macOS)
  5. Results are returned as structured JSON

Local vs OpenAI

Local (faster-whisper) OpenAI API
Cost Free $0.006/min
Privacy All data stays on device Audio sent to OpenAI
Speed ~1-10s depending on model ~1-3s
Setup Automatic (downloads model on first use) Requires OPENAI_API_KEY
Accuracy Excellent with base or larger Excellent
Offline Yes No

Model Sizes

Model Parameters Speed Accuracy VRAM
tiny 39M Fastest Lowest ~1GB
base 74M Fast Good ~1GB
small 244M Moderate Better ~2GB
medium 769M Slow High ~5GB
large-v3 1550M Slowest Highest ~10GB
turbo ~800M Fast High ~6GB

English-only variants (tiny.en, base.en, small.en, medium.en) are slightly more accurate for English.

Privacy & Data

  • Local backend (faster-whisper): Audio stays on your device. Nothing leaves your machine.
  • OpenAI backend: Audio sent to OpenAI API per their data retention policy
  • Temporary files: Audio downloaded from Telegram is written to /tmp and deleted immediately after transcription
  • Logs: Go to stderr only — no audio content or credentials are ever logged

Development

git clone https://github.com/abid-mahdi/whisper-telegram-mcp.git
cd whisper-telegram-mcp
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run unit tests
pytest tests/ -v -m "not integration"

# Run integration tests (downloads ~150MB model on first run)
pytest tests/ -m integration -v

# Run with coverage
pytest tests/ --cov=src/whisper_telegram_mcp --cov-report=term-missing

MCP Inspector

uvx mcp dev src/whisper_telegram_mcp/server.py

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Run tests (pytest tests/ -v -m "not integration")
  4. Commit with conventional commits (feat:, fix:, docs:, etc.)
  5. Open a pull request

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_telegram_mcp-0.1.0.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_telegram_mcp-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file whisper_telegram_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: whisper_telegram_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_telegram_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e81bf9189e2ba1abb9fc9122f05df79faf7bf4403b66584c0129bf407057bb8f
MD5 beeae5ec2782e3cf1e1590b0d232e28c
BLAKE2b-256 804f9458d474b31a71c20d65aeff41b533ea313b87832c96395cc6173737be9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_telegram_mcp-0.1.0.tar.gz:

Publisher: publish.yml on abid-mahdi/whisper-telegram-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whisper_telegram_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_telegram_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff5d906545104468536f1ce85d2527c5edab571bbd50e9174de0f3335bf6107c
MD5 15d773445ec2ca820f442ece1fa835f4
BLAKE2b-256 6191752f078d34b6a078284f1a79b8ca3b88efb5ac3a0aa6623bbe3362909d45

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_telegram_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on abid-mahdi/whisper-telegram-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page