Generate language-learning HTML readers (with sentence-level LLM translations and TTS) from Markdown.
Project description
md-llm-lang-reader
Generate language-learning HTML readers from Markdown using an LLM:
- sentence-by-sentence splitting + translation
- one-click TTS playback for the source text (browser Web Speech API)
- fenced code blocks are preserved as code (not sent to the LLM)
This package is published on PyPI as md-llm-lang-reader, and installs the CLI command langreader.
Features
- Markdown → HTML (simple headings + paragraphs)
- LLM-assisted sentence splitting (natural sentence boundaries)
- Sentence-level translations (each source sentence paired with its translation)
- TTS button per source sentence
- Fenced code blocks (``` or ~~~) are emitted as
<pre><code>and are not sent to the LLM - Bullet lists are translated (no special handling; they are passed to the LLM as plain text)
Installation
pip install md-llm-lang-reader
Quick start
Create input.md:
# Example
Bonjour ! Ceci est un court paragraphe.
```python
# Code blocks are not translated.
print("Hello")
- Premier point
- Deuxième point
Generate `output.html`:
```bash
langreader \
-i input.md \
-o output.html \
--src fr \
--tgt en \
--provider YOUR_PROVIDER \
--model YOUR_MODEL
Open the generated HTML in your browser and click the speaker buttons.
CLI usage
langreader -i INPUT.md -o OUTPUT.html --src SRC --tgt TGT --provider PROVIDER --model MODEL [-v 0|1|2|3]
Options
-
-i, --input(required)
Input Markdown file path. -
-o, --output(required)
Output HTML file path. -
--src(default:fr)
Source language code (e.g.fr,de,es,ja). -
--tgt(default:en)
Target language code. -
--provider(required)
Provider name passed tomultiai(depends on yourmultiaiconfiguration). -
--model(required)
Model name passed tomultiai. -
-v, --verbose(default:1)
Controls terminal output:0: silent1: headings only2: paragraph preview (first ~5 words)3: full original paragraph text
Examples
French → English:
langreader -i alsace.md -o alsace.html --src fr --tgt en --provider ... --model ...
German → English:
langreader -i berlin.md -o berlin.html --src de --tgt en --provider ... --model ...
Japanese → English:
langreader -i news.md -o news.html --src ja --tgt en --provider ... --model ...
How it works
For each paragraph, the tool asks the LLM to:
- Split the paragraph into natural sentences (avoid splitting on abbreviations).
- Translate each sentence into the target language.
- Return only valid JSON in this schema:
[
{ "src": "…", "tgt": "…" }
]
The tool validates and parses the JSON and then generates HTML like:
- source sentence + TTS button
- translated sentence below it
Notes on Text-to-Speech (TTS)
- TTS uses the browser’s Web Speech API (
speechSynthesis). - Voice availability depends on the OS/browser. Some environments may have limited voices for certain languages.
- The tool sets the utterance language to
--src(e.g.fr). If you need a specific locale (e.g.fr-FR), you can currently edit the generated HTML (a future CLI option could expose this).
Markdown support (current)
Supported:
- Headings:
#,##,###,#### - Paragraphs: consecutive non-empty lines are joined with spaces
- Fenced code blocks: ``` or ~~~ (any info string is allowed)
Not yet supported (treated as plain text or not specially parsed):
- Blockquotes, tables, images
- Inline formatting (links/emphasis) is not rendered; it is passed as plain text
If you need richer Markdown rendering, consider adding a Markdown parser and preserving a mapping between original text and rendered HTML.
Security
This tool escapes text embedded into HTML and does not inline arbitrary text into onclick handlers.
TTS buttons store text in data-speak="..." attributes and use JS event listeners, which is safer and avoids quoting issues.
Still, treat generated HTML as untrusted if your input Markdown is untrusted.
Development
Clone and install in editable mode:
pip install -e .
Run tests:
pytest
Build the package:
python -m build
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file md_llm_lang_reader-0.1.0.tar.gz.
File metadata
- Download URL: md_llm_lang_reader-0.1.0.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdffe43775a03de898eaa5b0c29eba862feb4c3946053c3e8708f37b44c2cd00
|
|
| MD5 |
fe35267e2fc1c063d5d6acc69c5100d3
|
|
| BLAKE2b-256 |
ee57a754839f92892e6b7ccea2cc582740449184ae623a0b5dedba514696752a
|
File details
Details for the file md_llm_lang_reader-0.1.0-py3-none-any.whl.
File metadata
- Download URL: md_llm_lang_reader-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f18c1fc9f773a5d6185c621ded0bd60e3a285f5fd2222975863617c3e370835
|
|
| MD5 |
9a9564e4c1106ac0ad91c4e0980d9c71
|
|
| BLAKE2b-256 |
1b9b8b5c4534e6799ceabc32b0188c80a1e62755983df8bac03170cfcdd375e9
|