Text-to-speech for technical Markdown, with interactive pauses on code blocks.
Project description
md-tts
Listen to technical Markdown out loud, with interactive pauses on code blocks.
md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.
A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.
Why this exists
Existing TTS tools for Markdown either:
- treat code blocks as silence and skip them, leaving the listener confused about what just happened;
- read code character-by-character as if it were prose (
open-paren-self-comma-x), which is unusable; or - support SSML pauses but not interactive pauses where playback waits for the listener.
After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.
It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.
Features
- 🛑 Interactive pauses on code blocks and tables.
- 🎴 Flashcard mode for
<details><summary>Q</summary>A</details>(speak Q, wait, speak A). - 🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
- 🎧 Podcast mode (
--no-pause) that announces skipped blocks in the chosen language instead of waiting. - 🔊 Cross-platform TTS via
pyttsx4(SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key. - 🌐 Optional Edge neural voices (
--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet. - 🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).
Installation
pip install md-tts
This gives you the offline pyttsx4 backend (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux) plus the Markdown parser and interactive CLI — no API keys, no internet.
On Linux you also need
espeakinstalled at the system level:sudo apt-get install espeak libespeak1.
Optional extras
| Install | Adds | When to use |
|---|---|---|
pip install md-tts |
base only | Local TTS playback, no neural voices. |
pip install "md-tts[edge]" |
edge-tts + pygame |
Microsoft Edge neural voices (--backend edge) and real pause/resume during playback. |
pip install "md-tts[export]" |
edge-tts only |
MP3 export (--export out.mp3) on headless or mobile environments where pygame/SDL2 is hard to install. |
Termux / Android
pygame requires SDL2 native libraries and is painful to build on Termux. If you only want to generate MP3 files and listen to them with a regular Android audio player, use the [export] extra:
pkg install python
pip install "md-tts[export]"
md-tts notes.md --backend edge --export notes.mp3
That avoids pygame entirely. For local playback on Termux with pyttsx4 (no Edge), also install pkg install espeak. Interactive controls (SPACE pause, +/- rate) work only with the [edge] extra, which is not recommended on Termux.
From source (development)
git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev # or: pip install -e ".[dev]"
Usage
# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md
# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause
# Force a language (no auto-detect):
md-tts notes.md --lang es
# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"
# Tune speed:
md-tts notes.md --rate 220
# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices
# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge
# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge
# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices
# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.
You can also run the module directly:
python -m md_tts notes.md
Markdown features supported
| Markdown construct | Behaviour |
|---|---|
| Headings | Spoken with Chapter: / Section: prefix (or Capítulo: in Spanish). |
| Paragraphs | Spoken as prose. |
Inline code ` ` |
Quoted in the spoken output (e.g. 'git status') so it’s audibly distinct from prose. |
| Fenced code blocks | Pause + print to terminal. |
| Tables | Pause + print rows. |
| Inline images | Announced inline as [imagen: alt]. |
| Lists | Spoken as Punto 1: ..., Punto 2: ... (Spanish prefix used in both languages currently). |
| Block quotes | Prefixed with Cita:. |
HR (---) |
Spoken as Separador.. |
<details><summary>Q</summary>A</details> |
Flashcard: speak Q, wait for ENTER, speak A. |
Math blocks (
$$ ... $$) and standalone image blocks are not yet detected as pause points — they fall through as text. Adding them is on the roadmap.
Architecture
.md file
│
▼
parser.parse_markdown(text) → Iterator[Block]
│ kind ∈ {text, code, table, card}
▼
cli.run() ← argparse + interactive loop
│
▼
reader.build_reader(backend) → LocalReader (pyttsx4) or EdgeReader (edge-tts)
Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.
The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.
Roadmap
- Interactive controls during playback (SPACE / s / n / b / +/- / q) — v0.3
- Optional cloud-quality TTS backend (Microsoft Edge neural voices) — v0.2
- Rewind / skip-back during interactive mode — v0.3
- MP3 export of an entire document for offline mobile listening — v0.4
- PyPI release (
pip install md-tts) - Math blocks (
$$ ... $$) detected as pause points instead of being read as prose - Standalone image blocks announced as
[image: alt-text]instead of being silently flattened - Bookmarks: persist a per-document position so
--resumepicks up where you left off -
--chapterflag to start playback from a specific heading - Real-time rate change (requires a streaming pitch-preserving resampler — non-trivial)
- More backends: Piper (local neural, fast, free), Azure TTS (premium voices via API key)
Development
uv sync --extra dev # install dev extras (pytest, pytest-cov, ruff)
uv run pytest # 48 tests
uv run ruff check .
uv run ruff format .
Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.
License
MIT — see LICENSE.
Author
Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file md_tts-0.4.2.tar.gz.
File metadata
- Download URL: md_tts-0.4.2.tar.gz
- Upload date:
- Size: 194.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41d4cd101083d31825ea3088c02e5d1f23696529e33bfc26e977a593f7ae3c17
|
|
| MD5 |
cb14d84dc6a4da0d900fe2d371f5b241
|
|
| BLAKE2b-256 |
428206064cd1f8089efd34363b700b8810c49a525b1411528750f4e07a8a4012
|
Provenance
The following attestation bundles were made for md_tts-0.4.2.tar.gz:
Publisher:
publish.yml on jmponcebe/md-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
md_tts-0.4.2.tar.gz -
Subject digest:
41d4cd101083d31825ea3088c02e5d1f23696529e33bfc26e977a593f7ae3c17 - Sigstore transparency entry: 1529335106
- Sigstore integration time:
-
Permalink:
jmponcebe/md-tts@e8d4bf3c13fb08a6598c7823c8715b8df872449c -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/jmponcebe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e8d4bf3c13fb08a6598c7823c8715b8df872449c -
Trigger Event:
push
-
Statement type:
File details
Details for the file md_tts-0.4.2-py3-none-any.whl.
File metadata
- Download URL: md_tts-0.4.2-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d326c5a3c5afd1b621ac44ee2b8e41c5206073709dc84fbcbbab9376655f2ca
|
|
| MD5 |
607e5bde4f71183c56a20250667006c7
|
|
| BLAKE2b-256 |
57d15183ae7660f27a7b942b92726df3ba378e162a33fc86474f5052c096a579
|
Provenance
The following attestation bundles were made for md_tts-0.4.2-py3-none-any.whl:
Publisher:
publish.yml on jmponcebe/md-tts
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
md_tts-0.4.2-py3-none-any.whl -
Subject digest:
6d326c5a3c5afd1b621ac44ee2b8e41c5206073709dc84fbcbbab9376655f2ca - Sigstore transparency entry: 1529335199
- Sigstore integration time:
-
Permalink:
jmponcebe/md-tts@e8d4bf3c13fb08a6598c7823c8715b8df872449c -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/jmponcebe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e8d4bf3c13fb08a6598c7823c8715b8df872449c -
Trigger Event:
push
-
Statement type: