Skip to main content

Generate language-learning HTML readers (with sentence-level LLM translations and TTS) from Markdown.

Project description

md-llm-lang-reader

Generate language-learning HTML readers from Markdown using an LLM:

  • sentence-by-sentence splitting + translation
  • one-click TTS playback for the source text (browser Web Speech API)
  • fenced code blocks are preserved as code (not sent to the LLM)

This package is published on PyPI as md-llm-lang-reader, and installs the CLI command langreader.

en ja fr

Features

  • Markdown → HTML (simple headings + paragraphs)
  • LLM-assisted sentence splitting (natural sentence boundaries)
  • Sentence-level translations (each source sentence paired with its translation)
  • TTS button per source sentence
  • Fenced code blocks (``` or ~~~) are emitted as <pre><code> and are not sent to the LLM
  • Bullet lists are translated (no special handling; they are passed to the LLM as plain text)

Installation

pip install md-llm-lang-reader

Quick start

Create input.md:

# Example

Bonjour ! Ceci est un court paragraphe.

```python
# Code blocks are not translated.
print("Hello")
  • Premier point
  • Deuxième point

Generate `output.html`:

```bash
langreader \
  -i input.md \
  -o output.html \
  --src fr \
  --tgt en \
  --provider YOUR_PROVIDER \
  --model YOUR_MODEL

Open the generated HTML in your browser and click the speaker buttons.

CLI usage

langreader -i INPUT.md -o OUTPUT.html --src SRC --tgt TGT --provider PROVIDER --model MODEL [-v 0|1|2|3]

Options

  • -i, --input (required)
    Input Markdown file path.

  • -o, --output (required)
    Output HTML file path.

  • --src (default: fr)
    Source language code (e.g. fr, de, es, ja).

  • --tgt (default: en)
    Target language code.

  • --provider (required)
    Provider name passed to multiai (depends on your multiai configuration).

  • --model (required)
    Model name passed to multiai.

  • -v, --verbose (default: 1)
    Controls terminal output:

    • 0: silent
    • 1: headings only
    • 2: paragraph preview (first ~5 words)
    • 3: full original paragraph text

Examples

French → English:

langreader -i alsace.md -o alsace.html --src fr --tgt en --provider ... --model ...

German → English:

langreader -i berlin.md -o berlin.html --src de --tgt en --provider ... --model ...

Japanese → English:

langreader -i news.md -o news.html --src ja --tgt en --provider ... --model ...

How it works

For each paragraph, the tool asks the LLM to:

  1. Split the paragraph into natural sentences (avoid splitting on abbreviations).
  2. Translate each sentence into the target language.
  3. Return only valid JSON in this schema:
[
  { "src": "…", "tgt": "…" }
]

The tool validates and parses the JSON and then generates HTML like:

  • source sentence + TTS button
  • translated sentence below it

Notes on Text-to-Speech (TTS)

  • TTS uses the browser’s Web Speech API (speechSynthesis).
  • Voice availability depends on the OS/browser. Some environments may have limited voices for certain languages.
  • The tool sets the utterance language to --src (e.g. fr). If you need a specific locale (e.g. fr-FR), you can currently edit the generated HTML (a future CLI option could expose this).

Markdown support (current)

Supported:

  • Headings: #, ##, ###, ####
  • Paragraphs: consecutive non-empty lines are joined with spaces
  • Fenced code blocks: ``` or ~~~ (any info string is allowed)

Not yet supported (treated as plain text or not specially parsed):

  • Blockquotes, tables, images
  • Inline formatting (links/emphasis) is not rendered; it is passed as plain text

If you need richer Markdown rendering, consider adding a Markdown parser and preserving a mapping between original text and rendered HTML.

Security

This tool escapes text embedded into HTML and does not inline arbitrary text into onclick handlers. TTS buttons store text in data-speak="..." attributes and use JS event listeners, which is safer and avoids quoting issues.

Still, treat generated HTML as untrusted if your input Markdown is untrusted.

Development

Clone and install in editable mode:

pip install -e .

Run tests:

pytest

Build the package:

python -m build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

md_llm_lang_reader-0.1.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

md_llm_lang_reader-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file md_llm_lang_reader-0.1.0.tar.gz.

File metadata

  • Download URL: md_llm_lang_reader-0.1.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for md_llm_lang_reader-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fdffe43775a03de898eaa5b0c29eba862feb4c3946053c3e8708f37b44c2cd00
MD5 fe35267e2fc1c063d5d6acc69c5100d3
BLAKE2b-256 ee57a754839f92892e6b7ccea2cc582740449184ae623a0b5dedba514696752a

See more details on using hashes here.

File details

Details for the file md_llm_lang_reader-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for md_llm_lang_reader-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f18c1fc9f773a5d6185c621ded0bd60e3a285f5fd2222975863617c3e370835
MD5 9a9564e4c1106ac0ad91c4e0980d9c71
BLAKE2b-256 1b9b8b5c4534e6799ceabc32b0188c80a1e62755983df8bac03170cfcdd375e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page