Text-to-speech for technical Markdown, with interactive pauses on code blocks.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmponcebe

These details have not been verified by PyPI

Project description

md-tts

Listen to technical Markdown out loud, with interactive pauses on code blocks.

md-tts reads a Markdown file aloud and stops on every code block, table and flashcard so you can actually look at the screen and study. It recognises <details><summary>Q</summary>A</details> blocks as flashcards (question → wait → answer) and detects the dominant language of the document (Spanish or English) to pick a single TTS voice for the whole session.

A --no-pause "podcast mode" is included for when you just want continuous playback in the background (commute, gym): instead of waiting on code blocks, it announces them and moves on.

Why this exists

Existing TTS tools for Markdown either:

treat code blocks as silence and skip them, leaving the listener confused about what just happened;
read code character-by-character as if it were prose (open-paren-self-comma-x), which is unusable; or
support SSML pauses but not interactive pauses where playback waits for the listener.

After testing 8+ tools (Speechify, NaturalReader, Study MD Desk, VoxTrack and several SSML-based pipelines) nothing offered the combination of parse Markdown structure → speak prose → stop on code → wait for me. md-tts is a small Python CLI that does exactly that.

It is intentionally minimal. It targets developers who want to revise their own technical notes while away from the keyboard.

Features

🛑 Interactive pauses on code blocks and tables.
🎴 Flashcard mode for <details><summary>Q</summary>A</details> (speak Q, wait, speak A).
🌍 ES/EN dominant-language detection: the parser picks a single session voice based on the document’s dominant language. Per-paragraph voice switching was tried and proved unstable on SAPI5; it lives in the roadmap.
🎧 Podcast mode (--no-pause) that announces skipped blocks in the chosen language instead of waiting.
🔊 Cross-platform TTS via pyttsx4 (SAPI5 on Windows, NSSpeechSynthesizer on macOS, eSpeak on Linux). No cloud account, no API key.
🌐 Optional Edge neural voices (--backend edge): natural-sounding Microsoft voices, picks a voice per paragraph based on the detected language. Requires internet.
🧪 Unit tested on Python 3.11 / 3.12 / 3.13 (see CI).

Installation

md-tts is not yet on PyPI. Install from source:

git clone https://github.com/jmponcebe/md-tts.git
cd md-tts
uv sync --extra dev      # installs runtime + pytest/ruff (or: pip install -e ".[dev]")

On Linux you also need espeak: sudo apt-get install espeak libespeak1.

The optional Edge neural-voice backend has its own extra so the default install stays fully offline:

uv sync --extra edge     # or: pip install -e ".[edge]"

Usage

# Default: interactive — ENTER skips each code block / table / flashcard.
md-tts notes.md

# Podcast mode: never wait, just announce skipped blocks.
md-tts notes.md --no-pause

# Force a language (no auto-detect):
md-tts notes.md --lang es

# Force a specific voice by id (use --list-voices to discover them):
md-tts notes.md --voice "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_ES-ES_HELENA_11.0"

# Tune speed:
md-tts notes.md --rate 220

# Inspect voices available on this system (path is optional with this flag):
md-tts --list-voices

# Switch to Microsoft Edge neural voices (requires internet, sounds much better):
md-tts notes.md --backend edge

# Edge mode auto-picks an es-ES voice for Spanish paragraphs and en-US for
# English ones, so a bilingual document is read with the right voice per
# paragraph automatically.
md-tts notes.md --backend edge

# Inspect the Edge voice catalogue:
md-tts --backend edge --list-voices

# Export to MP3 for offline listening (commute, gym, etc.):
md-tts notes.md --backend edge --export notes.mp3
# Code blocks become short "skipping code block" announcements; <details>
# cards keep their question + 3 s silence + answer pattern.

You can also run the module directly:

python -m md_tts notes.md

Markdown features supported

Markdown construct	Behaviour
Headings	Spoken with `Chapter:` / `Section:` prefix (or `Capítulo:` in Spanish).
Paragraphs	Spoken as prose.
Inline code ` `	Quoted in the spoken output (e.g. `'git status'`) so it’s audibly distinct from prose.
Fenced code blocks	Pause + print to terminal.
Tables	Pause + print rows.
Inline images	Announced inline as `[imagen: alt]`.
Lists	Spoken as `Punto 1: ..., Punto 2: ...` (Spanish prefix used in both languages currently).
Block quotes	Prefixed with `Cita:`.
HR (`---`)	Spoken as `Separador.`.
`<details><summary>Q</summary>A</details>`	Flashcard: speak Q, wait for ENTER, speak A.

Math blocks ($$ ... $$) and standalone image blocks are not detected as pause points in v0.1.0 — they fall through as text. Adding them is on the roadmap.

Architecture

.md file
   │
   ▼
parser.parse_markdown(text)         → Iterator[Block]
   │                                  kind ∈ {text, code, table, card}
   ▼
cli.run()                           ← argparse + interactive loop
   │
   ▼
reader.build_reader(backend)        → LocalReader (pyttsx4) or EdgeReader (edge-tts)

Three modules plus per-backend implementations, ~700 lines total. The parser builds on top of markdown-it-py and pre-processes <details> HTML blocks with a regex/placeholder trick before parsing, because markdown-it treats raw HTML as opaque tokens.

The local backend uses pyttsx4 (a maintained fork of pyttsx3) because pyttsx3 2.99 exhibits a SAPI5 bug on Windows where only the first runAndWait() call produces audio. The edge backend uses edge-tts to call Microsoft Edge's neural voices over HTTPS (no Microsoft account, no API key) and plays the resulting MP3 with pygame.mixer.music, which exposes real pause/unpause cross-platform (SDL_mixer under the hood) — that's what enables the SPACE control during a paragraph. Per-paragraph voice switching works on edge because each utterance is an independent HTTP request, with no shared engine state to corrupt.

Roadmap

Interactive controls during playback (SPACE / s / n / b / +/- / q) — v0.3
Optional cloud-quality TTS backend (Microsoft Edge neural voices) — v0.2
Rewind / skip-back during interactive mode — v0.3
MP3 export of an entire document for offline mobile listening — v0.4
PyPI release (pip install md-tts)
Math blocks ($$ ... $$) detected as pause points instead of being read as prose
Standalone image blocks announced as [image: alt-text] instead of being silently flattened
Bookmarks: persist a per-document position so --resume picks up where you left off
--chapter flag to start playback from a specific heading
Real-time rate change (requires a streaming pitch-preserving resampler — non-trivial)
More backends: Piper (local neural, fast, free), Azure TTS (premium voices via API key)

Development

uv sync --extra dev          # install dev extras (pytest, pytest-cov, ruff)
uv run pytest                # 48 tests
uv run ruff check .
uv run ruff format .

Conventional commits, feature branches off main, squash-merge by default. See .github/copilot-instructions.md for the full contributor guide.

License

MIT — see LICENSE.

Author

Jose María Ponce Bernabé. Built as a side-project while studying for AI / Data Engineering interviews — needed a way to revise PharmaGraphRAG and DengueMLOps notes during commutes.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jmponcebe

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.2

May 14, 2026

This version

0.4.1

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_tts-0.4.1.tar.gz (193.5 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

md_tts-0.4.1-py3-none-any.whl (27.8 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file md_tts-0.4.1.tar.gz.

File metadata

Download URL: md_tts-0.4.1.tar.gz
Upload date: May 13, 2026
Size: 193.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`6943fd033badd0e0415cc255f86e9950a3cb151eaa0939b1795318df368515ea`
MD5	`23f83f2fa248a19d60255e916da8f89f`
BLAKE2b-256	`e5f4a231a89e43da6a62409abfc4e4f8f35f19c8d57c5b2b23450db490d59e94`

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.1.tar.gz:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: md_tts-0.4.1.tar.gz
- Subject digest: 6943fd033badd0e0415cc255f86e9950a3cb151eaa0939b1795318df368515ea
- Sigstore transparency entry: 1527626195
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: jmponcebe/md-tts@01d35ebe610ced7f35418200e45701c3d93b0a3e
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/jmponcebe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@01d35ebe610ced7f35418200e45701c3d93b0a3e
- Trigger Event: push

File details

Details for the file md_tts-0.4.1-py3-none-any.whl.

File metadata

Download URL: md_tts-0.4.1-py3-none-any.whl
Upload date: May 13, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for md_tts-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d454634415c82d851392f782e6ada5bc4a8555d97465c5cf4c102b4c95e5e89`
MD5	`759eb4c59ddec8e54e169f17e37c83e2`
BLAKE2b-256	`9a31cc43ea301f507c673947512776df47f2b35cfdfaa7b85ebb6c19922b232b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for md_tts-0.4.1-py3-none-any.whl:

Publisher: publish.yml on jmponcebe/md-tts

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: md_tts-0.4.1-py3-none-any.whl
- Subject digest: 0d454634415c82d851392f782e6ada5bc4a8555d97465c5cf4c102b4c95e5e89
- Sigstore transparency entry: 1527626363
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: jmponcebe/md-tts@01d35ebe610ced7f35418200e45701c3d93b0a3e
- Branch / Tag: refs/tags/v0.4.1
- Owner: https://github.com/jmponcebe
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@01d35ebe610ced7f35418200e45701c3d93b0a3e
- Trigger Event: push

md-tts 0.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

md-tts

Why this exists

Features

Installation

Usage

Markdown features supported

Architecture

Roadmap

Development

License

Author

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance