Skip to main content

DocTranslater — PDF translation with layout preservation and multi-provider LLM routing

Project description

DocTranslater

Translate PDFs while keeping layout, figures, and structure as intact as possible. DocTranslater turns pages into an intermediate representation, sends text to your chosen LLM backend, then typesets the result back into a new PDF.

This repo: miguelenes/doctranslate — a maintained fork of funstory-ai/DocTranslate. Fork lineage and license notes live under Attribution at the end of this file so they do not slow you down.


Where to go next

Pick what you need — you can always come back here.

I want to… Start here
Install and run my first translation Start here (~5 minutes)
See every CLI flag and config option Configuration
Use several providers (failover, cost-aware routing) Multi-translator setup
Run without a hosted API (Ollama, vLLM, …) Local translation
Browse the full docs site Getting started
Contribute code or report issues Contributing
Dig into pipeline stages Implementation details

Start here (~5 minutes)

You will: clone the project, install dependencies, and produce one translated PDF (using OpenAI as the simplest hosted path).

Requirements: Python 3.10+ and uv (recommended).

git clone https://github.com/miguelenes/doctranslate.git
cd doctranslate
uv sync --locked --group dev

uv run doctranslate --version
uv run doctranslate --help
# Same CLI, alternate entry point:
uv run doc-translate --help

Set your API key and translate a file (replace paths and languages as needed). The CLI uses subcommands (for example translate, assets). See docs/migration.md if you are upgrading from 0.5.x.

export OPENAI_API_KEY="sk-..."

uv run doctranslate translate input.pdf \
  --provider openai \
  --source-lang en --target-lang zh \
  -o ./out

When it works: you should see new PDFs under the output directory (-o / --output-dir). If something fails, check Troubleshooting below or run uv run doctranslate --help / uv run doctranslate translate --help.

Scanned or messy PDFs? Try OCR before layout (still PDF → IL → LLM → PDF):

uv run doctranslate translate scan.pdf --provider openai \
  --source-lang en --target-lang zh --ocr-mode auto

Details: Configuration (--ocr-mode, --ocr-pages, --ocr-debug).


What you get

DocTranslater is aimed at technical and layout-heavy PDFs: papers, manuals, specs, and reports where you care about paragraphs, tables, and figures staying readable.

Highlights

  • Several backends: route across OpenAI, Anthropic, local models, and more (router mode).
  • Layout-aware processing: YOLO-based regions for figures, tables, formulas, and body text.
  • Strong PDF output: reflow into page geometry, font handling, optional watermarking, single- or dual-language PDFs.
  • Glossaries: term extraction and custom glossary workflows.
  • Scale: split large jobs and process pages in parallel when it helps.
  • Cost and reliability: per-provider metrics and strategies like failover or cost-aware routing.
  • Translation memory (optional): reuse prior segments — docs/translation-memory.md.

Typical uses: research PDFs, compliance packs, datasheets, internal docs, anything where “plain text dump” is not enough.


Usage (pick your path)

The sections below assume you already ran uv sync --locked --group dev and use uv run doctranslate …. If you installed the package into an active environment, you can call doctranslate directly instead.

OpenAI (quick path)

export OPENAI_API_KEY="sk-..."

uv run doctranslate translate input.pdf \
  --provider openai \
  --source-lang en --target-lang zh \
  -o ./out

Warm assets / offline bundle:

uv run doctranslate assets warmup
uv run doctranslate assets pack-offline /path/to/bundle_dir
uv run doctranslate assets restore-offline /path/to/bundle.tar.zst

Use --openai-model, --openai-base-url, and optional --openai-term-extraction-* (see doctranslate translate --help).

API behavior note: on the default OpenAI host, simple translate() calls may use the Responses API, while JSON-heavy llm_translate() flows (term extraction, batched IL translation) may use structured parse. If you set a custom --openai-base-url gateway, chat completions are used throughout.

Multi-provider router (TOML)

Best when you want profiles, failover, or mixing providers. Point the CLI at a config file:

uv run doctranslate translate input.pdf \
  --provider router \
  -c doctranslate.toml \
  --source-lang en --target-lang es \
  -o ./out

Example doctranslate.toml (nested providers + profiles; secrets via environment variables):

[doctranslate]
translator = "router"
routing_profile = "translate"
term_extraction_profile = "terms"
routing_strategy = "failover"
metrics_output = "log"

[doctranslate.profiles.translate]
providers = ["openai_fast", "anthropic_backup"]
strategy = "failover"
max_attempts = 4
require_json_mode = false

[doctranslate.profiles.terms]
providers = ["openai_fast"]
strategy = "failover"
require_json_mode = true

[doctranslate.providers.openai_fast]
provider = "openai"
model = "gpt-4o-mini"
api_key_env = "OPENAI_API_KEY"

[doctranslate.providers.anthropic_backup]
provider = "anthropic"
model = "claude-3-5-sonnet-latest"
api_key_env = "ANTHROPIC_API_KEY"

Validate configuration without running a full job:

uv run doctranslate config validate --translator router -c doctranslate.toml

More examples and JSON metrics export: docs/multi-translator.md.

Local translation (no hosted API key)

Example with Ollama:

uv run doctranslate translate input.pdf \
  --provider local \
  --local-backend ollama \
  --local-model qwen2.5:7b \
  --source-lang en --target-lang zh \
  -o ./out

vLLM, OpenAI-compatible URLs, batch tuning, and troubleshooting: Local translation.

Using DocTranslater from Python

For router mode from code, use doctranslate.translator.factory.build_translators with translator_mode="router" and a config path, or build a TranslatorRouter with LiteLLMProviderExecutor instances for advanced or test scenarios — see tests/test_translator_router.py.


Architecture (short version)

DocTranslater is a PDF → intermediate language (IL) → LLM → PDF pipeline. In plain terms: it understands page structure, translates text in context, then lays translated text back onto the page instead of pasting a single blob of text.

PDF Input
    ↓
[Frontend] ILCreater         → Parse PDF structure
    ↓
[Midend]  LayoutParser       → Detect layout regions (YOLO)
          ParagraphFinder    → Group characters into paragraphs
          ILTranslator         → Translate via LLM (incl. multi-translator router)
          Typesetting          → Reflow text into page geometry
    ↓
[Backend] PDFCreater         → Render IL to PDF
    ↓
PDF Output (single/dual-language, watermarked)

Multi-provider routing

TranslatorRouter (doctranslate/translator/router.py) — synchronous and BaseTranslator-compatible:

  • LiteLLM-backed providers: OpenAI, Anthropic, OpenRouter, OpenAI-compatible gateways, Ollama
  • Strategies: failover, round_robin, least_loaded, cost_aware
  • Per-provider metrics (requests, latency, tokens, estimated cost) and optional JSON export

Metrics and monitoring

After a run with --translator router, the CLI logs per-provider metrics when metrics_output includes log. In application code, a TranslatorRouter exposes metrics you can record with logging (avoid print() in libraries and tools — see Contributing):

import logging

log = logging.getLogger(__name__)

for pid, stats in router.get_metrics().items():
    log.debug(
        "%s success=%.3f cost_usd=%.4f avg_latency_ms=%.1f",
        pid,
        stats.success_rate,
        stats.total_cost_usd,
        stats.avg_latency_ms,
    )
log.debug("%s", router.print_metrics())

JSON export and router options: docs/multi-translator.md.


Development

git clone https://github.com/miguelenes/doctranslate.git
cd doctranslate

# Optional: classic venv (uv still manages deps below)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

uv sync --locked --group dev
uv run pytest tests/ -v

# Docs: live preview
uv run mkdocs serve   # http://127.0.0.1:8000

# Same static output as CI
uv run zensical build --clean

GitHub Pages publishing on push to main is described in docs/github-pages.md.

Focused tests

uv run pytest tests/ -q
uv run pytest tests/test_translator_router.py -v
uv run pytest --cov=doctranslate tests/

Performance (indicative)

Rough benchmarks on typical PDFs (GPT-4-era models; your mileage will vary):

Document type Pages Time (minutes) Cost (USD)
Technical whitepaper 15 3.5 0.45
Research paper 25 6.2 0.78
Regulatory doc 50 12.1 1.52

Times include layout detection, translation, and PDF rendering. Actual cost depends on backend, model, and token usage.


Troubleshooting

No module named 'doctranslate'

uv sync --locked --group dev
uv run python -c "import doctranslate; print(doctranslate.__version__)"

If you use pip in an editable install: pip install -e .

Translation is slow

  • Router: try least_loaded or cost_aware where appropriate.
  • Enable split translation with doctranslate translate … --split-pages N (alias --max-pages-per-part).
  • Use a faster (sometimes lower-quality) model for drafts.

Layout looks wrong after translation

  • Tune fonts with --primary-font-family (see doctranslate translate --help).
  • Try --watermark-mode no_watermark (alias --watermark-output-mode).
  • Confirm the source is not an image-only scan without OCR — see --ocr-mode above.

Getting help


Documentation index


Attribution

DocTranslater (this fork) builds on DocTranslate by funstory-ai Limited, under AGPL-3.0.

Shared with upstream

  • Core IL pipeline
  • YOLO-based layout detection
  • PDF parsing and rendering utilities
  • Glossary system and translation caching

Notable additions in this fork

  • Multi-translator router and richer configuration
  • Rebranded CLI, package layout, and documentation refresh
  • General architecture and extensibility improvements

License compliance: this fork and upstream are GNU Affero General Public License v3.0 (AGPL-3.0). If you run DocTranslater as a service, you must offer corresponding source to users (AGPL §13). Full text: LICENSE and LICENSE.ADDITIONS.


License

DocTranslater is licensed under GNU Affero General Public License v3.0 (AGPL-3.0).

  • You may use, modify, and distribute this software under the license terms.
  • Modifications must remain under AGPL-3.0.
  • Network use as a service triggers source-offer obligations — read LICENSE.
  • Preserve upstream copyright notices as required.

Credits

  • Original project: DocTranslate — funstory-ai Limited
  • This fork: Miguel Enes (2025)

Questions? Open an issue or browse the docs/ folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doctranslater-0.6.0.tar.gz (16.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doctranslater-0.6.0-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file doctranslater-0.6.0.tar.gz.

File metadata

  • Download URL: doctranslater-0.6.0.tar.gz
  • Upload date:
  • Size: 16.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for doctranslater-0.6.0.tar.gz
Algorithm Hash digest
SHA256 54c78809edf9ee37a1915099835f933695a0af04f509ce92eb37918c847789d2
MD5 abcfa9093fb234dafb53ce787175b13c
BLAKE2b-256 4ae5c23eaf2d9a385fc3f1bbd14116c9e06b936c1b8d687598eb22e983f128cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for doctranslater-0.6.0.tar.gz:

Publisher: publish-to-pypi.yml on miguelenes/doctranslate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file doctranslater-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: doctranslater-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for doctranslater-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2cc85fa51695ea92d31d9eb5028e720262536f18b7bc397429ee9a1f4e81146
MD5 5a660cfa0a5460a00850e96690ed98df
BLAKE2b-256 318a00e44eacfae346b93e0f772334cd10ebb88c1d1582d459ac05df97f3b35f

See more details on using hashes here.

Provenance

The following attestation bundles were made for doctranslater-0.6.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on miguelenes/doctranslate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page