Incremental Markdown translation with LLMs — track blocks in PO files, only retranslate what changed

These details have not been verified by PyPI

Project links

Project description

mdpo-llm

Translate Markdown with LLMs — and only pay for what changed.

mdpo-llm splits your Markdown into blocks, tracks each one in a PO file, and sends only new or changed blocks to your LLM. Edit one paragraph in a 50-block document? One API call, not fifty.

How It Works

flowchart LR
    A["Markdown\nSource"] --> B["Parse\ninto blocks"]
    B --> C["Track\nin PO file"]
    C --> D{"Changed?"}
    D -- Yes --> E["Send to\nLLM"]
    D -- No --> F["Reuse existing\ntranslation"]
    E --> G["Reconstruct\nMarkdown"]
    F --> G

Each block (heading, paragraph, code block, list, table) is tracked independently. On subsequent runs, only blocks whose source text changed get sent to the LLM — the rest are served from the PO cache.

Incremental processing in practice

First run:    8 blocks parsed → 8 API calls → full document translated
Edit source:  change 1 paragraph
Second run:   8 blocks parsed → 1 API call  → only the changed block retranslated

Translation Context

Blocks aren't translated in isolation. As each block is translated, it's added to a reference pool. Subsequent blocks receive the most similar previous translations as few-shot examples, so the LLM maintains consistent tone, terminology, and style across the entire document.

Block 1: "Introduction"     → translated (no context yet)
Block 2: "Getting Started"  → translated with Block 1 as reference
Block 3: "Installation"     → translated with Blocks 1–2 as reference
...

On re-runs, the pool is seeded from all existing translations in the PO file, so even a single changed paragraph benefits from the full document's context.

Installation

pip install mdpo-llm

Quick Start

1. Translate a document

No subclassing, no boilerplate. Pass a model string and go.

from pathlib import Path
from mdpo_llm import MdpoLLM

processor = MdpoLLM(
    model="gpt-4",            # any LiteLLM model string
    target_lang="ko",         # baked into the system prompt
    temperature=0.3,          # forwarded to litellm.completion()
)

result = processor.process_document(
    source_path=Path("docs/README.md"),
    target_path=Path("docs/README_ko.md"),
    # po_path defaults to docs/README_ko.po
)

print(f"Processed {result['translation_stats']['processed']} blocks")
print(f"Coverage: {result['coverage']['coverage_percentage']}%")

Run it again after editing the source — only the changed paragraphs get reprocessed.

2. Process a directory

result = processor.process_directory(
    source_dir=Path("docs/"),
    target_dir=Path("docs_ko/"),
    glob="**/*.md",
    max_workers=4,  # files processed concurrently
    # po_dir defaults to target_dir (PO files next to translated files)
)

print(f"{result['files_processed']} files processed")
print(f"{result['files_skipped']} files unchanged")

The directory structure is mirrored into target_dir. Each file gets its own PO file and its own reference pool. By default, PO files are placed next to the target files; pass po_dir to store them separately.

3. Use any provider

LiteLLM supports 100+ providers. Just change the model string:

# OpenAI
MdpoLLM(model="gpt-4", target_lang="ko")

# Anthropic
MdpoLLM(model="anthropic/claude-sonnet-4-5-20250929", target_lang="ko")

# Google
MdpoLLM(model="gemini/gemini-pro", target_lang="ko")

# Azure OpenAI
MdpoLLM(model="azure/my-deployment", target_lang="ko", api_base="https://...")

Language Handling

`target_lang` — tell the LLM which language to produce

A BCP 47 locale string (e.g. "ko", "ja", "zh-CN") baked into the system prompt. The source language is auto-detected by the LLM — you only specify the target.

processor = MdpoLLM(model="gpt-4", target_lang="ja")

When target_lang is set, new PO files will include a Language header (e.g. Language: ja).

Glossary

Protect brand names, trademarks, and proper nouns from translation — or force specific translations for them.

Inline glossary

processor = MdpoLLM(
    model="gpt-4",
    target_lang="ko",
    glossary={
        "GitHub": None,                # None = do not translate
        "Markdown": None,
        "pull request": "풀 리퀘스트",  # force specific translation
        "API": "API",
    },
)

JSON glossary file

For multi-locale projects, keep a single glossary.json:

{
  "GitHub": null,
  "Markdown": null,
  "pull request": {
    "ko": "풀 리퀘스트",
    "ja": "プルリクエスト"
  },
  "API": "API"
}

null — do not translate (any locale)
"string" — use this translation for all locales
{"ko": "...", "ja": "..."} — per-locale; if the current locale isn't listed, the term is kept as-is

processor = MdpoLLM(
    model="gpt-4",
    target_lang="ko",
    glossary_path="glossary.json",
)

If both glossary and glossary_path are provided, inline entries override the file.

Only glossary terms that actually appear in each block are injected into the prompt, so a large glossary doesn't waste tokens on irrelevant blocks.

See examples/glossary.json for a full example with brand names, technical terms, and per-locale translations.

Comparison

	mdpo-llm	mdpo	md-translator	llm-translator
LLM-powered	Yes	No	Yes	Yes
Incremental (block-level)	Yes	Yes	No	No
PO file tracking	Yes	Yes	No	No
Translation context (few-shot)	Yes	No	Partial	No
LLM-agnostic	Yes	—	Multi-provider	OpenAI only
Batch directory processing	Yes	Yes	No	No

mdpo pioneered PO-based Markdown translation but targets manual/MT workflows, not LLMs. md-translator and llm-translator use LLMs but reprocess entire files on every run. mdpo-llm combines both: PO-tracked incremental processing with LLM-powered translation and cross-block context.

API Reference

MdpoLLM

Constructor:

MdpoLLM(
    model,                     # any LiteLLM model string (required)
    target_lang,               # BCP 47 string, baked into system prompt (required)
    max_reference_pairs=5,     # max similar pairs passed as few-shot context
    extra_instructions=None,   # appended to the built-in translation prompt
    post_process=None,         # Callable[[str], str] applied to every LLM response
    glossary=None,             # dict[str, str | None] — inline glossary
    glossary_path=None,        # path to JSON glossary file (multi-locale)
    **litellm_kwargs,          # temperature, api_key, api_base, etc.
)

Method	Description
`process_document(source_path, target_path, po_path=None, inplace=False)`	Process a single Markdown file. `po_path` defaults to `target_path` with `.po` extension.
`process_directory(source_dir, target_dir, po_dir=None, glob, inplace, max_workers)`	Process a directory tree concurrently. `po_dir` defaults to `target_dir`.
`get_translation_stats(source_path, po_path)`	Return coverage and block statistics
`export_report(source_path, po_path)`	Generate a detailed text report

Prompts

The Prompts class exposes all built-in prompt templates used by the processor:

from mdpo_llm import Prompts

# See the default translation instruction
print(Prompts.TRANSLATE_INSTRUCTION)

Working with PO Files

PO files (GNU gettext) track the state of each content block:

Untranslated — new content, will be sent to the LLM
Translated — completed, reused on subsequent runs
Fuzzy — source changed since last run, will be retranslated
Obsolete — source block was removed, cleaned up automatically

You can inspect and edit PO files with any standard gettext tool (Poedit, Lokalize, etc.).

When target_lang is set, new PO files include a Language metadata header so tools can identify the target language.

Development

# Install with dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest tests/

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 19, 2026

0.3.0

Apr 19, 2026

This version

0.2.1

Feb 16, 2026

0.1.0

Oct 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdpo_llm-0.2.1.tar.gz (200.6 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mdpo_llm-0.2.1-py3-none-any.whl (20.2 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file mdpo_llm-0.2.1.tar.gz.

File metadata

Download URL: mdpo_llm-0.2.1.tar.gz
Upload date: Feb 16, 2026
Size: 200.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mdpo_llm-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b4ccb431e0723fcad44dd5e1cdec2401592cfed69f8b695415bae403cb891371`
MD5	`bf89fe05e83cea4f7cb500f2f9c7f5a8`
BLAKE2b-256	`e5610aad05d8041d1cce5deed25ec4bff98e914d52dd7163cbf2452028f8d9e3`

See more details on using hashes here.

File details

Details for the file mdpo_llm-0.2.1-py3-none-any.whl.

File metadata

Download URL: mdpo_llm-0.2.1-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mdpo_llm-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2426a15160ba399c34cd1c75840f14e7a0bfb17b480dc00d182055e867a157fb`
MD5	`a4feeb8c07a2b1af2e324b407ee21474`
BLAKE2b-256	`cf4330465858865c9edd77e78663108051a7fa5c709fc6e1143c0161a3fc36e4`

See more details on using hashes here.

mdpo-llm 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mdpo-llm

How It Works

Incremental processing in practice

Translation Context

Installation

Quick Start

1. Translate a document

2. Process a directory

3. Use any provider

Language Handling

target_lang — tell the LLM which language to produce

Glossary

Inline glossary

JSON glossary file

Comparison

API Reference

MdpoLLM

Prompts

Working with PO Files

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`target_lang` — tell the LLM which language to produce