MCP server for transcribing Telegram voice messages using Whisper, with Kokoro/OpenAI TTS voice replies

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abid-mahdi

These details have not been verified by PyPI

Project description

whisper-telegram-mcp

Transcribe and speak — two-way voice for Claude via Telegram

An MCP server that gives Claude two-way voice capabilities via Telegram: transcribe incoming voice messages with Whisper, and reply with synthesized speech. Works with Claude Desktop, Claude Code, and any MCP-compatible client.

What It Does

Transcribe local audio files -- OGG, WAV, MP3, FLAC, and more
Transcribe Telegram voice messages -- pass a file_id, get text back
Speak text as voice notes -- synthesise speech and send back as OGG (plays as a voice note in Telegram)
Two transcription backends -- local faster-whisper (free, private) or OpenAI Whisper API (cloud)
Auto mode -- tries local first, falls back to OpenAI if it fails
Language detection -- automatic or specify an ISO-639-1 code
Word-level timestamps -- optional fine-grained timing

Prerequisites

Feature	Requirement
Transcription (local)	None — faster-whisper bundled via `[local]` extras
Transcription (cloud)	`OPENAI_API_KEY` env var
Voice replies — Kokoro (best quality)	Docker — run `docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest`
Voice replies — OpenAI TTS (fallback)	`OPENAI_API_KEY` env var
Voice replies — macOS say (last resort)	Mac only, no setup

Kokoro requires Docker. If Docker isn't running, voice replies fall back to OpenAI TTS or macOS say automatically.

Quick Start

One command with `uvx`

uvx whisper-telegram-mcp

No installation needed -- uvx handles everything.

Or install with pip

pip install whisper-telegram-mcp
whisper-telegram-mcp

Telegram Bot Setup

Open Telegram and message @BotFather
Send /newbot and follow the prompts to create a bot
Copy the token (looks like 1234567890:ABCdef...)
Add TELEGRAM_BOT_TOKEN to your MCP config env (see below)
Message your bot to start — it'll only respond to approved users

The Claude Telegram plugin handles access control. See its docs for pairing/allowlist setup.

Integration

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto",
        "TELEGRAM_BOT_TOKEN": "your-bot-token-here"
      }
    }
  }
}

Claude Code

Add to your project's .mcp.json:

{
  "mcpServers": {
    "whisper-telegram-mcp": {
      "command": "uvx",
      "args": ["whisper-telegram-mcp"],
      "env": {
        "WHISPER_MODEL": "base",
        "WHISPER_BACKEND": "auto",
        "TELEGRAM_BOT_TOKEN": "your-bot-token-here"
      }
    }
  }
}

Tools

Tool	Description
`transcribe_audio`	Transcribe a local audio file (OGG, WAV, MP3, etc.) to text
`transcribe_telegram_voice`	Download and transcribe a Telegram voice message by `file_id`
`speak_text`	Convert text to speech → OGG/Opus file (plays as voice note in Telegram)
`list_models`	List available Whisper model sizes with speed/accuracy info
`check_backends`	Check which backends (local/OpenAI) are available and configured

`transcribe_audio`

file_path: str        # Absolute path to audio file
language: str | None  # ISO-639-1 code (e.g. "en"), None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

`transcribe_telegram_voice`

file_id: str          # Telegram voice message file_id
bot_token: str | None # Bot token (falls back to TELEGRAM_BOT_TOKEN env var)
language: str | None  # ISO-639-1 code, None = auto-detect
word_timestamps: bool # Include word-level timestamps (default: false)

`speak_text`

Converts text to an OGG/Opus audio file. Automatically selects the best available TTS backend.

text: str             # Text to synthesise
voice: str            # Voice name (default: "af_sky")
output_path: str|None # Optional path for output .ogg file

TTS Backends (in priority order):

Backend	Cost	Quality	Setup
Kokoro (local)	Free	Natural, high quality	Start manually (see below)
OpenAI TTS (cloud)	~$0.015/1k chars	High quality	`OPENAI_API_KEY` env var
macOS say (fallback)	Free	Robotic	Mac only, no setup

In auto mode (default), the server tries Kokoro first, then OpenAI, then macOS say. Configure with TTS_BACKEND env var.

Starting Kokoro locally:

Kokoro FastAPI is not on PyPI — start it before running the MCP server:

# Docker (simplest, recommended)
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest

# Apple Silicon (GPU-accelerated)
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu-mac:latest

# From source
git clone https://github.com/remsky/Kokoro-FastAPI && cd Kokoro-FastAPI && ./start-cpu.sh

Once running, the MCP server auto-detects it at http://127.0.0.1:8880/v1. Override with KOKORO_BASE_URL env var.

Kokoro voices (primary):

Voice	Accent	Style
`af_sky`	US	Female (default)
`af_bella`	US	Female
`af_sarah`	US	Female
`af_nicole`	US	Female
`am_adam`	US	Male
`am_michael`	US	Male
`bf_emma`	UK	Female
`bf_isabella`	UK	Female
`bm_george`	UK	Male
`bm_lewis`	UK	Male

OpenAI voices (fallback):

Voice	Style
`alloy`	Neutral
`echo`	Male
`fable`	Narrative
`onyx`	Deep male
`nova`	Female
`shimmer`	Soft female

Kokoro voice names are automatically mapped to the closest OpenAI or macOS equivalent when falling back.

Returns:

{
  "file_path": "/tmp/tmpXXX.ogg",
  "size_bytes": 16555,
  "backend": "kokoro",
  "voice": "af_sky",
  "success": true,
  "error": null
}

Send the returned file_path as a Telegram attachment and it will appear as a native voice note.

Transcription response format

All transcription tools return:

{
  "text": "Hello, this is a voice message.",
  "language": "en",
  "language_probability": 0.98,
  "duration": 3.5,
  "segments": [
    {"start": 0.0, "end": 3.5, "text": "Hello, this is a voice message."}
  ],
  "backend": "local",
  "success": true,
  "error": null
}

Configuration

All configuration is via environment variables:

Variable	Default	Description
`WHISPER_BACKEND`	`auto`	`auto`, `local`, or `openai`
`WHISPER_MODEL`	`base`	Whisper model size (see below)
`OPENAI_API_KEY`	--	Required for `openai` transcription and TTS backends
`TELEGRAM_BOT_TOKEN`	--	Required for `transcribe_telegram_voice`
`WHISPER_LANGUAGE`	auto-detect	ISO-639-1 language code
`TTS_BACKEND`	`auto`	`auto`, `kokoro`, `openai`, or `macos`
`TTS_VOICE`	`af_sky`	Default voice for `speak_text` (Kokoro voice name)
`KOKORO_BASE_URL`	`http://127.0.0.1:8880/v1`	Kokoro FastAPI base URL

How It Works

                         MCP Client (Claude)
                              |
                         [MCP stdio]
                              |
                    whisper-telegram-mcp
                    /         |         \
                   /          |          \
      transcribe_audio  transcribe_     speak_text
                        telegram_voice      |
              |               |          auto_tts()
              |         [Bot API DL]    /    |    \
              +--------+------+     Kokoro OpenAI macOS
                       |            (local) (cloud) (say)
                 auto_transcribe()      |
                  /           \      .ogg file
           LocalBackend    OpenAIBackend
           (faster-whisper)  (Whisper API)

Claude sends a tool call via MCP (stdio transport)
For Telegram voice messages, the file is downloaded via Bot API
auto_transcribe() picks the best available transcription backend
auto_tts() picks the best available TTS backend (Kokoro -> OpenAI -> macOS)
Results are returned as structured JSON

Local vs OpenAI

	Local (faster-whisper)	OpenAI API
Cost	Free	$0.006/min
Privacy	All data stays on device	Audio sent to OpenAI
Speed	~1-10s depending on model	~1-3s
Setup	Automatic (downloads model on first use)	Requires `OPENAI_API_KEY`
Accuracy	Excellent with `base` or larger	Excellent
Offline	Yes	No

Model Sizes

Model	Parameters	Speed	Accuracy	VRAM
`tiny`	39M	Fastest	Lowest	~1GB
`base`	74M	Fast	Good	~1GB
`small`	244M	Moderate	Better	~2GB
`medium`	769M	Slow	High	~5GB
`large-v3`	1550M	Slowest	Highest	~10GB
`turbo`	~800M	Fast	High	~6GB

English-only variants (tiny.en, base.en, small.en, medium.en) are slightly more accurate for English.

Privacy & Data

Local backend (faster-whisper): Audio stays on your device. Nothing leaves your machine.
OpenAI backend: Audio sent to OpenAI API per their data retention policy
Temporary files: Audio downloaded from Telegram is written to /tmp and deleted immediately after transcription
Logs: Go to stderr only — no audio content or credentials are ever logged

Development

git clone https://github.com/abid-mahdi/whisper-telegram-mcp.git
cd whisper-telegram-mcp
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

# Run unit tests
pytest tests/ -v -m "not integration"

# Run integration tests (downloads ~150MB model on first run)
pytest tests/ -m integration -v

# Run with coverage
pytest tests/ --cov=src/whisper_telegram_mcp --cov-report=term-missing

MCP Inspector

uvx mcp dev src/whisper_telegram_mcp/server.py

Contributing

Fork the repository
Create a feature branch (git checkout -b feat/amazing-feature)
Run tests (pytest tests/ -v -m "not integration")
Commit with conventional commits (feat:, fix:, docs:, etc.)
Open a pull request

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abid-mahdi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_telegram_mcp-0.1.0.tar.gz (34.3 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_telegram_mcp-0.1.0-py3-none-any.whl (17.4 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file whisper_telegram_mcp-0.1.0.tar.gz.

File metadata

Download URL: whisper_telegram_mcp-0.1.0.tar.gz
Upload date: Mar 30, 2026
Size: 34.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_telegram_mcp-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e81bf9189e2ba1abb9fc9122f05df79faf7bf4403b66584c0129bf407057bb8f`
MD5	`beeae5ec2782e3cf1e1590b0d232e28c`
BLAKE2b-256	`804f9458d474b31a71c20d65aeff41b533ea313b87832c96395cc6173737be9f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_telegram_mcp-0.1.0.tar.gz:

Publisher: publish.yml on abid-mahdi/whisper-telegram-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_telegram_mcp-0.1.0.tar.gz
- Subject digest: e81bf9189e2ba1abb9fc9122f05df79faf7bf4403b66584c0129bf407057bb8f
- Sigstore transparency entry: 1200827133
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: abid-mahdi/whisper-telegram-mcp@bf673c3bc1d1660331cdb32404f1b831333cc1b3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/abid-mahdi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bf673c3bc1d1660331cdb32404f1b831333cc1b3
- Trigger Event: push

File details

Details for the file whisper_telegram_mcp-0.1.0-py3-none-any.whl.

File metadata

Download URL: whisper_telegram_mcp-0.1.0-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 17.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whisper_telegram_mcp-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff5d906545104468536f1ce85d2527c5edab571bbd50e9174de0f3335bf6107c`
MD5	`15d773445ec2ca820f442ece1fa835f4`
BLAKE2b-256	`6191752f078d34b6a078284f1a79b8ca3b88efb5ac3a0aa6623bbe3362909d45`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_telegram_mcp-0.1.0-py3-none-any.whl:

Publisher: publish.yml on abid-mahdi/whisper-telegram-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_telegram_mcp-0.1.0-py3-none-any.whl
- Subject digest: ff5d906545104468536f1ce85d2527c5edab571bbd50e9174de0f3335bf6107c
- Sigstore transparency entry: 1200827217
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: abid-mahdi/whisper-telegram-mcp@bf673c3bc1d1660331cdb32404f1b831333cc1b3
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/abid-mahdi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@bf673c3bc1d1660331cdb32404f1b831333cc1b3
- Trigger Event: push

whisper-telegram-mcp 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

whisper-telegram-mcp

What It Does

Prerequisites

Quick Start

One command with uvx

Or install with pip

Telegram Bot Setup

Integration

Claude Desktop

Claude Code

Tools

transcribe_audio

transcribe_telegram_voice

speak_text

Transcription response format

Configuration

How It Works

Local vs OpenAI

Model Sizes

Privacy & Data

Development

MCP Inspector

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

One command with `uvx`

`transcribe_audio`

`transcribe_telegram_voice`

`speak_text`