Skip to main content

TTS MCP server and CLI for language learning (AWS Polly, OpenAI)

Project description

langlearn-tts

PyPI GitHub Tests Python

Generate audio flashcards and vocabulary drills from text. Ask Claude to synthesize words and phrases in any language, or batch-process entire vocabulary lists from the command line. Audio is slowed to 90% speed by default so learners can hear pronunciation clearly.

The pair mode is the core workflow: give it an English word and its translation, and it produces a single MP3 — [English audio] [pause] [target language audio] — ready for Anki, spaced repetition, or passive listening.

Available as both a Claude Desktop MCP server (ask Claude to generate audio in conversation) and a CLI with identical functionality. Supports AWS Polly (recommended — best pronunciation) and OpenAI TTS (easier setup) today; ElevenLabs (highest quality) is planned.

Features

  • Single synthesis — convert text to MP3 in any supported language
  • Batch synthesis — synthesize multiple texts, optionally merged into one file
  • Pair synthesis — stitch two languages together: [English] [pause] [L2]
  • Pair batch — batch-process vocabulary lists as stitched pairs
  • Auto-play — MCP tools play audio immediately after synthesis
  • Configurable speech rate — default 90% for learner-friendly pacing
  • Two providers — AWS Polly (93 voices, 41 languages) or OpenAI TTS (9 voices, 57 languages)
  • Provider selection — defaults to Polly; set LANGLEARN_TTS_PROVIDER=openai or use --provider openai for OpenAI

Quick Start

1. Install uv (Python package manager)

If you don't have uv yet:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

uv manages Python versions automatically — you don't need to install Python separately.

2. Install langlearn-tts

uv tool install langlearn-tts

This installs the langlearn-tts CLI and langlearn-tts-server MCP server globally.

3. Install ffmpeg

Required for audio stitching (pairs, merged batches). Single synthesis works without it.

# macOS (requires Homebrew — install from https://brew.sh if needed)
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
winget install ffmpeg

4. Configure a TTS provider

Pick one provider. Polly is the default — its dedicated per-language neural voices produce more native-sounding pronunciation than OpenAI's multilingual model. Use --provider openai or set LANGLEARN_TTS_PROVIDER=openai if you prefer OpenAI.

Option A — AWS Polly (recommended, best pronunciation):

93 dedicated per-language neural voices, 41 languages. Each voice is trained for a specific language, producing more natural pronunciation for language learning. Requires an AWS account with polly:SynthesizeSpeech and polly:DescribeVoices permissions.

Install the AWS CLI, then:

aws configure

Enter your Access Key ID, Secret Access Key, and region (e.g., us-east-1). Alternatively, set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION environment variables.

Option B — OpenAI TTS (easier setup):

9 multilingual voices, 57 languages. Simpler to configure (one API key), but uses a single multilingual model for all languages. Pronunciation quality varies by language.

export OPENAI_API_KEY=sk-...

Pricing: $15/1M characters (tts-1) or $30/1M (tts-1-hd).

5. Verify

langlearn-tts doctor

All required checks should show . Fix any that show before continuing.

From source (development)

git clone https://github.com/jmf-pobox/langlearn-tts-mcp.git
cd langlearn-tts-mcp
uv sync --all-extras
uv run langlearn-tts --help

Claude Desktop Setup

Automatic (recommended)

langlearn-tts install

This registers the MCP server with Claude Desktop. Defaults to Polly. Use --provider openai for OpenAI (writes your OPENAI_API_KEY into the config).

Options:

  • --provider NAME — provider (polly or openai). Default: polly
  • --output-dir PATH — custom audio output directory (default: ~/Claude-Audio)
  • --uvx-path PATH — override the uvx binary path

Restart Claude Desktop after running install.

Manual

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "langlearn-tts": {
      "command": "/absolute/path/to/uvx",
      "args": ["--from", "langlearn-tts", "langlearn-tts-server"],
      "env": {
        "LANGLEARN_TTS_OUTPUT_DIR": "/absolute/path/to/output/directory"
      }
    }
  }
}

Claude Desktop does not inherit your shell environment. All paths must be absolute, and API keys (like OPENAI_API_KEY) must be literal values in this file (env var references are not supported). Find your uvx path with which uvx.

Env var Required Description
LANGLEARN_TTS_PROVIDER No polly (default) or openai
OPENAI_API_KEY For OpenAI Your literal API key (Claude Desktop does not support env var references)
LANGLEARN_TTS_OUTPUT_DIR No Output directory (default: ~/Claude-Audio)
LANGLEARN_TTS_MODEL No OpenAI model (tts-1, tts-1-hd). Default: tts-1

For Polly (default), AWS credentials are read from ~/.aws/credentials. For OpenAI, add LANGLEARN_TTS_PROVIDER: "openai" and OPENAI_API_KEY to the env dict.

Restart Claude Desktop after editing the config.

AI Tutor Prompts

langlearn-tts ships with 28 ready-made AI tutor prompts — one for each combination of 7 languages and 4 levels. Paste a prompt into a Claude Desktop Project's Instructions field, and Claude becomes a language tutor that generates audio during lessons.

Browse prompts

# List all available prompts
langlearn-tts prompt list

# Print a prompt (pipe to clipboard with pbcopy on macOS)
langlearn-tts prompt show german-high-school | pbcopy

Set up a Claude Desktop Project

  1. In Claude Desktop, click Projects in the sidebar
  2. Click Create Project and name it (e.g., "German with Herr Schmidt")
  3. Open the project, click Set custom instructions
  4. Paste the prompt content into the Instructions field
  5. Start a new conversation within that project

Using a Project keeps the tutor persona scoped to language learning. Other conversations are unaffected.

Available languages and levels

Language High School 1st Year 2nd Year Advanced
German Herr Schmidt Professorin Weber Professor Hartmann Professor Becker
Spanish Profesora Elena Profesor Garcia Profesora Carmen Profesora Reyes
French Madame Moreau Professeur Laurent Professeur Dubois Professeur Beaumont
Russian Irina Petrovna Professor Dmitri Professor Natasha Professor Mikhail
Korean Kim-seonsaengnim Professor Park Professor Kim Professor Yoon
Japanese Tanaka-sensei Yamamoto-sensei Suzuki-sensei Mori-sensei
Chinese Laoshi Wang Professor Chen Professor Zhang Professor Wei

Each prompt creates a tutor persona calibrated to the student's level, based on Mollick & Mollick's "Assigning AI" framework. Customize any prompt by adjusting student background, voice selection, speech rate, or focus areas.

Troubleshooting

langlearn-tts doctor

Checks Python version, active provider, ffmpeg, provider-specific credentials, uvx, Claude Desktop config, and output directory. Required checks must pass (exit code 1 on failure); optional checks show markers.

Voices

AWS Polly (default)

Any voice from the AWS Polly voice list is supported. Voice names are case-insensitive. Each voice is a dedicated neural model trained for a specific language, producing the most native-sounding pronunciation. The tool queries the Polly API on first use and caches the result.

Common voices for language learning:

Voice Language Engine
joanna English (US) neural
matthew English (US) neural
daniel German neural
vicki German (female) neural
lucia Spanish (European) neural
lupe Spanish (US) neural
léa French neural
tatyana Russian standard
seoyeon Korean neural
takumi Japanese neural
zhiyu Chinese (Mandarin) neural

The engine (neural, standard, generative, long-form) is selected automatically — neural preferred when available.

OpenAI TTS

9 multilingual voices. Any voice works with any language — the model infers the language from the text. Voice names are case-insensitive. Pronunciation quality varies by language; major languages tend to be solid, less-resourced languages can be inconsistent.

Voice Description
alloy Neutral, balanced
ash Warm, conversational
coral Clear, expressive
echo Smooth, authoritative
fable Warm, British-accented
onyx Deep, resonant
nova Friendly, upbeat
sage Calm, measured
shimmer Light, gentle

Select the model with --model tts-1 (faster, cheaper) or --model tts-1-hd (higher quality).

CLI Usage

# Single synthesis
langlearn-tts synthesize "Guten Morgen" --voice daniel -o morning.mp3

# Custom speech rate (percentage, default 90)
langlearn-tts synthesize "Привет" --voice tatyana --rate 70 -o privet.mp3

# Pair: English + German stitched with a pause
langlearn-tts synthesize-pair "good morning" "Guten Morgen" \
  --voice1 joanna --voice2 daniel -o pair.mp3

# Batch from JSON file (["hello", "world", "good morning"])
langlearn-tts synthesize-batch words.json -d output/

# Batch merged into single file
langlearn-tts synthesize-batch words.json -d output/ --merge --pause 800

# Pair batch from JSON file ([["strong", "stark"], ["house", "Haus"]])
langlearn-tts synthesize-pair-batch pairs.json -d output/

# Browse AI tutor prompts
langlearn-tts prompt list
langlearn-tts prompt show german-high-school | pbcopy

MCP Tools

All four tools are available in Claude Desktop once the server is configured:

Tool Description
synthesize Single text to MP3
synthesize_batch Multiple texts, optionally merged
synthesize_pair Two texts stitched with a pause
synthesize_pair_batch Multiple pairs, optionally merged

Each tool accepts auto_play (default: true) to play audio immediately after synthesis.

Roadmap

ElevenLabs Backend

Highest voice quality. 29+ languages, 5,000+ voices, voice cloning. Setup: ELEVENLABS_API_KEY env var. Free tier: 10K chars/month.

Development

# Install with dev dependencies
uv sync --all-extras

# Run tests
uv run pytest tests/ -v

# Linting and formatting
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Type checking
uv run mypy src/ tests/
uv run pyright src/ tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langlearn_tts-0.3.2.tar.gz (47.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langlearn_tts-0.3.2-py3-none-any.whl (84.0 kB view details)

Uploaded Python 3

File details

Details for the file langlearn_tts-0.3.2.tar.gz.

File metadata

  • Download URL: langlearn_tts-0.3.2.tar.gz
  • Upload date:
  • Size: 47.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for langlearn_tts-0.3.2.tar.gz
Algorithm Hash digest
SHA256 fa4887998edbae6de1bfab3e0b54a38c0ba5f433b045b9b4ce365c48144129ed
MD5 83d4238a7f87767b94e9260be699dc1d
BLAKE2b-256 c14ad79ee095fe8e39a29118b2f4d55d9a5aafaec285554cc2ba7d33684e18d4

See more details on using hashes here.

File details

Details for the file langlearn_tts-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: langlearn_tts-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 84.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for langlearn_tts-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 609bf072e772148f746f7f05af29834de8bbe730ec9abd164f607b151c3bcd78
MD5 96cfbb73fada77c24e5be525ea5effda
BLAKE2b-256 10d0200e532cdf1507ad3aae3cfeec3c12aa490b519458181a94c04566955a53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page