Skip to main content

Text-to-speech for technical Markdown, with interactive pauses on code blocks.

Project description

md-tts

Listen to technical Markdown out loud, with interactive pauses on code blocks.

CI Python License: MIT Style: ruff

md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.

A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.

Why this exists

Existing TTS tools for Markdown either:

  • treat code blocks as silence and skip them, leaving the listener confused about what just happened;
  • read code character-by-character as if it were prose (open-paren-self-comma-x), which is unusable; or
  • support SSML pauses but not interactive pauses where playback waits for the listener.

After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.

It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.

Features

  • 🛑 Interactive pauses on code blocks and tables.
  • 🎴 Flashcard mode for <details><summary>Q</summary>A</details> (speak Q, wait, speak A).
  • 🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
  • 🎧 Podcast mode (--no-pause) that announces skipped blocks in the chosen language instead of waiting.
  • 🔊 Cross-platform TTS via pyttsx4 (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key.
  • 🌐 Optional Edge neural voices (--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet.
  • 🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).

Installation

md-tts is not yet on PyPI. Install from source:

git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev      # installs runtime + pytest/ruff (or: pip install -e ".[dev]")

On Linux you also need espeak: sudo apt-get install espeak libespeak1.

The optional Edge neural-voice backend has its own extra so the default install stays fully offline:

uv sync --extra edge     # or: pip install -e ".[edge]"

Usage

# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md

# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause

# Force a language (no auto-detect):
md-tts notes.md --lang es

# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"

# Tune speed:
md-tts notes.md --rate 220

# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices

# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge

# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge

# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices

# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.

You can also run the module directly:

python -m md_tts notes.md

Markdown features supported

Markdown construct Behaviour
Headings Spoken with Chapter: / Section: prefix (or Capítulo: in Spanish).
Paragraphs Spoken as prose.
Inline code ` ` Quoted in the spoken output (e.g. 'git status') so it’s audibly distinct from prose.
Fenced code blocks Pause + print to terminal.
Tables Pause + print rows.
Inline images Announced inline as [imagen: alt].
Lists Spoken as Punto 1: ..., Punto 2: ... (Spanish prefix used in both languages currently).
Block quotes Prefixed with Cita:.
HR (---) Spoken as Separador..
<details><summary>Q</summary>A</details> Flashcard: speak Q, wait for ENTER, speak A.

Math blocks ($$ ... $$) and standalone image blocks are not detected as pause points in v0.1.0 — they fall through as text. Adding them is on the roadmap.

Architecture

.md file
   │
   ▼
parser.parse_markdown(text)         → Iterator[Block]
   │                                  kind ∈ {text, code, table, card}
   ▼
cli.run()                           ← argparse + interactive loop
   │
   ▼
reader.build_reader(backend)        → LocalReader (pyttsx4) or EdgeReader (edge-tts)

Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.

The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.

Roadmap

  • Interactive controls during playback (SPACE / s / n / b / +/- / q) — v0.3
  • Optional cloud-quality TTS backend (Microsoft Edge neural voices) — v0.2
  • Rewind / skip-back during interactive mode — v0.3
  • MP3 export of an entire document for offline mobile listening — v0.4
  • PyPI release (pip install md-tts)
  • Math blocks ($$ ... $$) detected as pause points instead of being read as prose
  • Standalone image blocks announced as [image: alt-text] instead of being silently flattened
  • Bookmarks: persist a per-document position so --resume picks up where you left off
  • --chapter flag to start playback from a specific heading
  • Real-time rate change (requires a streaming pitch-preserving resampler — non-trivial)
  • More backends: Piper (local neural, fast, free), Azure TTS (premium voices via API key)

Development

uv sync --extra dev          # install dev extras (pytest, pytest-cov, ruff)
uv run pytest                # 48 tests
uv run ruff check .
uv run ruff format .

Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.

License

MIT — see LICENSE.

Author

Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_tts-0.4.1.tar.gz (193.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

md_tts-0.4.1-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file md_tts-0.4.1.tar.gz.

File metadata

  • Download URL: md_tts-0.4.1.tar.gz
  • Upload date:
  • Size: 193.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.1.tar.gz
Algorithm Hash digest
SHA256 6943fd033badd0e0415cc255f86e9950a3cb151eaa0939b1795318df368515ea
MD5 23f83f2fa248a19d60255e916da8f89f
BLAKE2b-256 e5f4a231a89e43da6a62409abfc4e4f8f35f19c8d57c5b2b23450db490d59e94

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.1.tar.gz:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file md_tts-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: md_tts-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d454634415c82d851392f782e6ada5bc4a8555d97465c5cf4c102b4c95e5e89
MD5 759eb4c59ddec8e54e169f17e37c83e2
BLAKE2b-256 9a31cc43ea301f507c673947512776df47f2b35cfdfaa7b85ebb6c19922b232b

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.1-py3-none-any.whl:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page