Skip to main content

Text-to-speech for technical Markdown, with interactive pauses on code blocks.

Project description

md-tts

Listen to technical Markdown out loud, with interactive pauses on code blocks.

CI Python License: MIT Style: ruff

md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.

A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.

Why this exists

Existing TTS tools for Markdown either:

  • treat code blocks as silence and skip them, leaving the listener confused about what just happened;
  • read code character-by-character as if it were prose (open-paren-self-comma-x), which is unusable; or
  • support SSML pauses but not interactive pauses where playback waits for the listener.

After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.

It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.

Features

  • 🛑 Interactive pauses on code blocks and tables.
  • 🎴 Flashcard mode for <details><summary>Q</summary>A</details> (speak Q, wait, speak A).
  • 🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
  • 🎧 Podcast mode (--no-pause) that announces skipped blocks in the chosen language instead of waiting.
  • 🔊 Cross-platform TTS via pyttsx4 (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key.
  • 🌐 Optional Edge neural voices (--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet.
  • 🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).

Installation

pip install md-tts

This gives you the offline pyttsx4 backend (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux) plus the Markdown parser and interactive CLI — no API keys, no internet.

On Linux you also need espeak installed at the system level: sudo apt-get install espeak libespeak1.

Optional extras

Install Adds When to use
pip install md-tts base only Local TTS playback, no neural voices.
pip install "md-tts[edge]" edge-tts + pygame Microsoft Edge neural voices (--backend edge) and real pause/resume during playback.
pip install "md-tts[export]" edge-tts only MP3 export (--export out.mp3) on headless or mobile environments where pygame/SDL2 is hard to install.

Termux / Android

pygame requires SDL2 native libraries and is painful to build on Termux. If you only want to generate MP3 files and listen to them with a regular Android audio player, use the [export] extra:

pkg install python
pip install "md-tts[export]"
md-tts notes.md --backend edge --export notes.mp3

That avoids pygame entirely. For local playback on Termux with pyttsx4 (no Edge), also install pkg install espeak. Interactive controls (SPACE pause, +/- rate) work only with the [edge] extra, which is not recommended on Termux.

From source (development)

git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev      # or: pip install -e ".[dev]"

Usage

# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md

# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause

# Force a language (no auto-detect):
md-tts notes.md --lang es

# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"

# Tune speed:
md-tts notes.md --rate 220

# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices

# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge

# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge

# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices

# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.

You can also run the module directly:

python -m md_tts notes.md

Markdown features supported

Markdown construct Behaviour
Headings Spoken with Chapter: / Section: prefix (or Capítulo: in Spanish).
Paragraphs Spoken as prose.
Inline code ` ` Quoted in the spoken output (e.g. 'git status') so it’s audibly distinct from prose.
Fenced code blocks Pause + print to terminal.
Tables Pause + print rows.
Inline images Announced inline as [imagen: alt].
Lists Spoken as Punto 1: ..., Punto 2: ... (Spanish prefix used in both languages currently).
Block quotes Prefixed with Cita:.
HR (---) Spoken as Separador..
<details><summary>Q</summary>A</details> Flashcard: speak Q, wait for ENTER, speak A.

Math blocks ($$ ... $$) and standalone image blocks are not yet detected as pause points — they fall through as text. Adding them is on the roadmap.

Architecture

.md file
   │
   ▼
parser.parse_markdown(text)         → Iterator[Block]
   │                                  kind ∈ {text, code, table, card}
   ▼
cli.run()                           ← argparse + interactive loop
   │
   ▼
reader.build_reader(backend)        → LocalReader (pyttsx4) or EdgeReader (edge-tts)

Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.

The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.

Roadmap

  • Interactive controls during playback (SPACE / s / n / b / +/- / q) — v0.3
  • Optional cloud-quality TTS backend (Microsoft Edge neural voices) — v0.2
  • Rewind / skip-back during interactive mode — v0.3
  • MP3 export of an entire document for offline mobile listening — v0.4
  • PyPI release (pip install md-tts)
  • Math blocks ($$ ... $$) detected as pause points instead of being read as prose
  • Standalone image blocks announced as [image: alt-text] instead of being silently flattened
  • Bookmarks: persist a per-document position so --resume picks up where you left off
  • --chapter flag to start playback from a specific heading
  • Real-time rate change (requires a streaming pitch-preserving resampler — non-trivial)
  • More backends: Piper (local neural, fast, free), Azure TTS (premium voices via API key)

Development

uv sync --extra dev          # install dev extras (pytest, pytest-cov, ruff)
uv run pytest                # 48 tests
uv run ruff check .
uv run ruff format .

Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.

License

MIT — see LICENSE.

Author

Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_tts-0.4.2.tar.gz (194.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

md_tts-0.4.2-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file md_tts-0.4.2.tar.gz.

File metadata

  • Download URL: md_tts-0.4.2.tar.gz
  • Upload date:
  • Size: 194.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.2.tar.gz
Algorithm Hash digest
SHA256 41d4cd101083d31825ea3088c02e5d1f23696529e33bfc26e977a593f7ae3c17
MD5 cb14d84dc6a4da0d900fe2d371f5b241
BLAKE2b-256 428206064cd1f8089efd34363b700b8810c49a525b1411528750f4e07a8a4012

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.2.tar.gz:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file md_tts-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: md_tts-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 28.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6d326c5a3c5afd1b621ac44ee2b8e41c5206073709dc84fbcbbab9376655f2ca
MD5 607e5bde4f71183c56a20250667006c7
BLAKE2b-256 57d15183ae7660f27a7b942b92726df3ba378e162a33fc86474f5052c096a579

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.2-py3-none-any.whl:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page