DocTranslater — PDF translation with layout preservation and multi-provider LLM routing
Project description
DocTranslater
Translate PDFs while keeping layout, figures, and structure as intact as possible. DocTranslater turns pages into an intermediate representation, sends text to your chosen LLM backend, then typesets the result back into a new PDF.
This repo: miguelenes/doctranslate — a maintained fork of funstory-ai/DocTranslate. Fork lineage and license notes live under Attribution at the end of this file so they do not slow you down.
Where to go next
Pick what you need — you can always come back here.
| I want to… | Start here |
|---|---|
| Install and run my first translation | Start here (~5 minutes) |
| See every CLI flag and config option | Configuration |
| Use several providers (failover, cost-aware routing) | Multi-translator setup |
| Run without a hosted API (Ollama, vLLM, …) | Local translation |
| Browse the full docs site | Getting started |
| Contribute code or report issues | Contributing |
| Dig into pipeline stages | Implementation details |
Start here (~5 minutes)
You will: clone the project, install dependencies, and produce one translated PDF (using OpenAI as the simplest hosted path).
Requirements: Python 3.10+ and uv (recommended).
git clone https://github.com/miguelenes/doctranslate.git
cd doctranslate
uv sync --locked --group dev
uv run doctranslate --version
uv run doctranslate --help
# Same CLI, alternate entry point:
uv run doc-translate --help
Set your API key and translate a file (replace paths and languages as needed). The CLI uses subcommands (for example translate, assets). See docs/migration.md if you are upgrading from 0.5.x.
export OPENAI_API_KEY="sk-..."
uv run doctranslate translate input.pdf \
--provider openai \
--source-lang en --target-lang zh \
-o ./out
When it works: you should see new PDFs under the output directory (-o / --output-dir). If something fails, check Troubleshooting below or run uv run doctranslate --help / uv run doctranslate translate --help.
Scanned or messy PDFs? Try OCR before layout (still PDF → IL → LLM → PDF):
uv run doctranslate translate scan.pdf --provider openai \
--source-lang en --target-lang zh --ocr-mode auto
Details: Configuration (--ocr-mode, --ocr-pages, --ocr-debug).
What you get
DocTranslater is aimed at technical and layout-heavy PDFs: papers, manuals, specs, and reports where you care about paragraphs, tables, and figures staying readable.
Highlights
- Several backends: route across OpenAI, Anthropic, local models, and more (router mode).
- Layout-aware processing: YOLO-based regions for figures, tables, formulas, and body text.
- Strong PDF output: reflow into page geometry, font handling, optional watermarking, single- or dual-language PDFs.
- Glossaries: term extraction and custom glossary workflows.
- Scale: split large jobs and process pages in parallel when it helps.
- Cost and reliability: per-provider metrics and strategies like failover or cost-aware routing.
- Translation memory (optional): reuse prior segments — docs/translation-memory.md.
Typical uses: research PDFs, compliance packs, datasheets, internal docs, anything where “plain text dump” is not enough.
Usage (pick your path)
The sections below assume you already ran uv sync --locked --group dev and use uv run doctranslate …. If you installed the package into an active environment, you can call doctranslate directly instead.
OpenAI (quick path)
export OPENAI_API_KEY="sk-..."
uv run doctranslate translate input.pdf \
--provider openai \
--source-lang en --target-lang zh \
-o ./out
Warm assets / offline bundle:
uv run doctranslate assets warmup
uv run doctranslate assets pack-offline /path/to/bundle_dir
uv run doctranslate assets restore-offline /path/to/bundle.tar.zst
Use --openai-model, --openai-base-url, and optional --openai-term-extraction-* (see doctranslate translate --help).
API behavior note: on the default OpenAI host, simple translate() calls may use the Responses API, while JSON-heavy llm_translate() flows (term extraction, batched IL translation) may use structured parse. If you set a custom --openai-base-url gateway, chat completions are used throughout.
Multi-provider router (TOML)
Best when you want profiles, failover, or mixing providers. Point the CLI at a config file:
uv run doctranslate translate input.pdf \
--provider router \
-c doctranslate.toml \
--source-lang en --target-lang es \
-o ./out
Example doctranslate.toml (nested providers + profiles; secrets via environment variables):
[doctranslate]
translator = "router"
routing_profile = "translate"
term_extraction_profile = "terms"
routing_strategy = "failover"
metrics_output = "log"
[doctranslate.profiles.translate]
providers = ["openai_fast", "anthropic_backup"]
strategy = "failover"
max_attempts = 4
require_json_mode = false
[doctranslate.profiles.terms]
providers = ["openai_fast"]
strategy = "failover"
require_json_mode = true
[doctranslate.providers.openai_fast]
provider = "openai"
model = "gpt-4o-mini"
api_key_env = "OPENAI_API_KEY"
[doctranslate.providers.anthropic_backup]
provider = "anthropic"
model = "claude-3-5-sonnet-latest"
api_key_env = "ANTHROPIC_API_KEY"
Validate configuration without running a full job:
uv run doctranslate config validate --translator router -c doctranslate.toml
More examples and JSON metrics export: docs/multi-translator.md.
Local translation (no hosted API key)
Example with Ollama:
uv run doctranslate translate input.pdf \
--provider local \
--local-backend ollama \
--local-model qwen2.5:7b \
--source-lang en --target-lang zh \
-o ./out
vLLM, OpenAI-compatible URLs, batch tuning, and troubleshooting: Local translation.
Using DocTranslater from Python
For router mode from code, use doctranslate.translator.factory.build_translators with translator_mode="router" and a config path, or build a TranslatorRouter with LiteLLMProviderExecutor instances for advanced or test scenarios — see tests/test_translator_router.py.
Architecture (short version)
DocTranslater is a PDF → intermediate language (IL) → LLM → PDF pipeline. In plain terms: it understands page structure, translates text in context, then lays translated text back onto the page instead of pasting a single blob of text.
PDF Input
↓
[Frontend] ILCreater → Parse PDF structure
↓
[Midend] LayoutParser → Detect layout regions (YOLO)
ParagraphFinder → Group characters into paragraphs
ILTranslator → Translate via LLM (incl. multi-translator router)
Typesetting → Reflow text into page geometry
↓
[Backend] PDFCreater → Render IL to PDF
↓
PDF Output (single/dual-language, watermarked)
Multi-provider routing
TranslatorRouter (doctranslate/translator/router.py) — synchronous and BaseTranslator-compatible:
- LiteLLM-backed providers: OpenAI, Anthropic, OpenRouter, OpenAI-compatible gateways, Ollama
- Strategies:
failover,round_robin,least_loaded,cost_aware - Per-provider metrics (requests, latency, tokens, estimated cost) and optional JSON export
Metrics and monitoring
After a run with --translator router, the CLI logs per-provider metrics when metrics_output includes log. In application code, a TranslatorRouter exposes metrics you can record with logging (avoid print() in libraries and tools — see Contributing):
import logging
log = logging.getLogger(__name__)
for pid, stats in router.get_metrics().items():
log.debug(
"%s success=%.3f cost_usd=%.4f avg_latency_ms=%.1f",
pid,
stats.success_rate,
stats.total_cost_usd,
stats.avg_latency_ms,
)
log.debug("%s", router.print_metrics())
JSON export and router options: docs/multi-translator.md.
Development
git clone https://github.com/miguelenes/doctranslate.git
cd doctranslate
# Optional: classic venv (uv still manages deps below)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
uv sync --locked --group dev
uv run pytest tests/ -v
# Docs: live preview
uv run mkdocs serve # http://127.0.0.1:8000
# Same static output as CI
uv run zensical build --clean
GitHub Pages publishing on push to main is described in docs/github-pages.md.
Focused tests
uv run pytest tests/ -q
uv run pytest tests/test_translator_router.py -v
uv run pytest --cov=doctranslate tests/
Performance (indicative)
Rough benchmarks on typical PDFs (GPT-4-era models; your mileage will vary):
| Document type | Pages | Time (minutes) | Cost (USD) |
|---|---|---|---|
| Technical whitepaper | 15 | 3.5 | 0.45 |
| Research paper | 25 | 6.2 | 0.78 |
| Regulatory doc | 50 | 12.1 | 1.52 |
Times include layout detection, translation, and PDF rendering. Actual cost depends on backend, model, and token usage.
Troubleshooting
No module named 'doctranslate'
uv sync --locked --group dev
uv run python -c "import doctranslate; print(doctranslate.__version__)"
If you use pip in an editable install: pip install -e .
Translation is slow
- Router: try
least_loadedorcost_awarewhere appropriate. - Enable split translation with
doctranslate translate … --split-pages N(alias--max-pages-per-part). - Use a faster (sometimes lower-quality) model for drafts.
Layout looks wrong after translation
- Tune fonts with
--primary-font-family(seedoctranslate translate --help). - Try
--watermark-mode no_watermark(alias--watermark-output-mode). - Confirm the source is not an image-only scan without OCR — see
--ocr-modeabove.
Getting help
- Guidelines: docs/CONTRIBUTING.md
- Open an issue or search existing ones
Documentation index
- Getting started — install and first steps
- Configuration — CLI and config file
- Multi-translator setup — router and providers
- Supported languages
- Implementation details — pipeline deep dives
- Contributing
Attribution
DocTranslater (this fork) builds on DocTranslate by funstory-ai Limited, under AGPL-3.0.
Shared with upstream
- Core IL pipeline
- YOLO-based layout detection
- PDF parsing and rendering utilities
- Glossary system and translation caching
Notable additions in this fork
- Multi-translator router and richer configuration
- Rebranded CLI, package layout, and documentation refresh
- General architecture and extensibility improvements
License compliance: this fork and upstream are GNU Affero General Public License v3.0 (AGPL-3.0). If you run DocTranslater as a service, you must offer corresponding source to users (AGPL §13). Full text: LICENSE and LICENSE.ADDITIONS.
License
DocTranslater is licensed under GNU Affero General Public License v3.0 (AGPL-3.0).
- You may use, modify, and distribute this software under the license terms.
- Modifications must remain under AGPL-3.0.
- Network use as a service triggers source-offer obligations — read
LICENSE. - Preserve upstream copyright notices as required.
Credits
- Original project: DocTranslate — funstory-ai Limited
- This fork: Miguel Enes (2025)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doctranslater-0.6.0.tar.gz.
File metadata
- Download URL: doctranslater-0.6.0.tar.gz
- Upload date:
- Size: 16.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54c78809edf9ee37a1915099835f933695a0af04f509ce92eb37918c847789d2
|
|
| MD5 |
abcfa9093fb234dafb53ce787175b13c
|
|
| BLAKE2b-256 |
4ae5c23eaf2d9a385fc3f1bbd14116c9e06b936c1b8d687598eb22e983f128cf
|
Provenance
The following attestation bundles were made for doctranslater-0.6.0.tar.gz:
Publisher:
publish-to-pypi.yml on miguelenes/doctranslate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doctranslater-0.6.0.tar.gz -
Subject digest:
54c78809edf9ee37a1915099835f933695a0af04f509ce92eb37918c847789d2 - Sigstore transparency entry: 1296283051
- Sigstore integration time:
-
Permalink:
miguelenes/doctranslate@15fa47c024e1378bf0b46d449aec6c1a73614ecb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/miguelenes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@15fa47c024e1378bf0b46d449aec6c1a73614ecb -
Trigger Event:
push
-
Statement type:
File details
Details for the file doctranslater-0.6.0-py3-none-any.whl.
File metadata
- Download URL: doctranslater-0.6.0-py3-none-any.whl
- Upload date:
- Size: 6.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2cc85fa51695ea92d31d9eb5028e720262536f18b7bc397429ee9a1f4e81146
|
|
| MD5 |
5a660cfa0a5460a00850e96690ed98df
|
|
| BLAKE2b-256 |
318a00e44eacfae346b93e0f772334cd10ebb88c1d1582d459ac05df97f3b35f
|
Provenance
The following attestation bundles were made for doctranslater-0.6.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on miguelenes/doctranslate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
doctranslater-0.6.0-py3-none-any.whl -
Subject digest:
e2cc85fa51695ea92d31d9eb5028e720262536f18b7bc397429ee9a1f4e81146 - Sigstore transparency entry: 1296283189
- Sigstore integration time:
-
Permalink:
miguelenes/doctranslate@15fa47c024e1378bf0b46d449aec6c1a73614ecb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/miguelenes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@15fa47c024e1378bf0b46d449aec6c1a73614ecb -
Trigger Event:
push
-
Statement type: