Skip to main content

Resumable PDF Translator with Google Translate and LLM support

Project description

PDF Translate

PyPI GitHub

A fast, resumable PDF translator that supports Google Translate (free) and LLM backends (OpenAI-compatible).

Fully resumable — you can kill the process at any time and resume later without losing progress.

Features

  • Text extraction via pdftext
  • Translation via googletrans (unofficial Google Translate API) or OpenAI compatible LLM.
  • Smart sentence-based chunking
  • Review mode with back-translation and confidence scoring
  • Automatic generation of clean HTML and PDF via WeasyPrint CLI
  • All intermediate results cached under .workflows/ and resume on crash

Installation

pip install pdf-translate

or

pip install git+https://github.com/guilt/pdf-translate

Setup from Source

git clone https://github.com/guilt/pdf-translate
cd pdf-translate

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

pip install .

Usage

1. Google Translate (Free & Simple)

# Single language
pdf-translate --language ta data/FRF-Interim-Final-Rule-Freelance.pdf

# Multiple languages with review
pdf-translate --language ta --language hi --review data/FRF-Interim-Final-Rule-Freelance.pdf

2. LLM Mode

# Set environment variables
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"

# Run translation
pdf-translate --translator llm --language ta --review data/FRF-Interim-Final-Rule-Freelance.pdf

Popular LLM Provider Examples

Grok

export OPENAI_API_KEY="xai-..."
export OPENAI_BASE_URL="https://api.x.ai/v1"
export OPENAI_MODEL="grok-4.20"

OpenRouter

export OPENAI_API_KEY="sk-..."     # Your Anthropic key
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export OPENAI_MODEL="anthropic/claude-opus-4-7"

Claude

export OPENAI_API_KEY="sk-ant-..."     # Your Anthropic key
export OPENAI_BASE_URL="https://api.anthropic.com/v1"
export OPENAI_MODEL="claude-opus-4-7"

Qwen

export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export OPENAI_MODEL="qwen3.5-plus"

OpenAI

export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-5.5"

DeepSeek

export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_MODEL="deepseek-chat"

Ollama

export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="gpt-oss:latest"

All Options

Flag Default Description
--language, -l required Target language code (can be used multiple times)
--translator googletrans googletrans or llm
--workers CPU cores / 2 Total concurrent translation workers
--chunk-size 400 Maximum characters per chunk
--review false Generate side-by-side review documents
--list-languages Show all supported languages and exit

Output Files

For a file document.pdf translated to Tamil (ta):

  • document-ta-Translated.md,document-ta-Translated.html and document-ta-Translated.pdf (PDF when WeasyPrint is available)
  • document-ta-ReviewTranslated.md, document-ta-ReviewTranslated.html and document-ta-ReviewTranslated.pdf (Only with --review)

All files are saved in the same directory as the source PDF.

Review Mode

This is by adding an optional flag --review

  • Performs back-translation (translated text → English)
  • Calculates confidence score for each chunk
  • Generates a rich side-by-side comparison table
  • Flags low-confidence chunks (< 40%) with ⚠

Language Codes

Run this command to see all supported languages:

python translate.py --list-languages

Examples: ta (Tamil), hi (Hindi), zh-cn (Chinese), ar (Arabic), ja (Japanese), fr (French), de (German), es (Spanish) etc.

Development

Install with dev and other dependencies:

pip install -e ".[dev,pdf,llm]"

Tests print full translation output by default (-s -v is configured in pyproject.toml).

Thank You and Feedback

All feedback welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_translate-1.0.0.tar.gz (13.6 kB view details)

Uploaded Source

File details

Details for the file pdf_translate-1.0.0.tar.gz.

File metadata

  • Download URL: pdf_translate-1.0.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pdf_translate-1.0.0.tar.gz
Algorithm Hash digest
SHA256 74ea69d88d38820a8e389dc8e61e984bb39c7bbb789f76346a73815b4be35cdd
MD5 2efc8690795fc96eaf151bcc5b1be9d3
BLAKE2b-256 48afc7b2ddbc6b5c14dedabbce351f1d4f735bc5a1eb275a37a0dfe505f420d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page