Resumable PDF Translator with Google Translate and LLM support
Project description
PDF Translate
A fast, resumable PDF translator that supports Google Translate (free) and LLM backends (OpenAI-compatible).
Fully resumable — you can kill the process at any time and resume later without losing progress.
Features
- Text extraction via pdftext
- Translation via googletrans (unofficial Google Translate API) or OpenAI compatible LLM.
- Smart sentence-based chunking
- Review mode with back-translation and confidence scoring
- Automatic generation of clean HTML and PDF via WeasyPrint CLI
- All intermediate results cached under
.workflows/and resume on crash
Installation
pip install pdf-translate
or
pip install git+https://github.com/guilt/pdf-translate
Setup from Source
git clone https://github.com/guilt/pdf-translate
cd pdf-translate
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install .
Usage
1. Google Translate (Free & Simple)
# Single language
pdf-translate --language ta data/FRF-Interim-Final-Rule-Freelance.pdf
# Multiple languages with review
pdf-translate --language ta --language hi --review data/FRF-Interim-Final-Rule-Freelance.pdf
2. LLM Mode
# Set environment variables
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"
# Run translation
pdf-translate --translator llm --language ta --review data/FRF-Interim-Final-Rule-Freelance.pdf
Popular LLM Provider Examples
Grok
export OPENAI_API_KEY="xai-..."
export OPENAI_BASE_URL="https://api.x.ai/v1"
export OPENAI_MODEL="grok-4.20"
OpenRouter
export OPENAI_API_KEY="sk-..." # Your Anthropic key
export OPENAI_BASE_URL="https://openrouter.ai/api/v1"
export OPENAI_MODEL="anthropic/claude-opus-4-7"
Claude
export OPENAI_API_KEY="sk-ant-..." # Your Anthropic key
export OPENAI_BASE_URL="https://api.anthropic.com/v1"
export OPENAI_MODEL="claude-opus-4-7"
Qwen
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export OPENAI_MODEL="qwen3.5-plus"
OpenAI
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-5.5"
DeepSeek
export OPENAI_API_KEY="sk-..."
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_MODEL="deepseek-chat"
Ollama
export OPENAI_API_KEY="ollama"
export OPENAI_BASE_URL="http://localhost:11434/v1"
export OPENAI_MODEL="gpt-oss:latest"
All Options
| Flag | Default | Description |
|---|---|---|
--language, -l |
required | Target language code (can be used multiple times) |
--translator |
googletrans |
googletrans or llm |
--workers |
CPU cores / 2 |
Total concurrent translation workers |
--chunk-size |
400 |
Maximum characters per chunk |
--review |
false |
Generate side-by-side review documents |
--list-languages |
— | Show all supported languages and exit |
Output Files
For a file document.pdf translated to Tamil (ta):
document-ta-Translated.md,document-ta-Translated.htmlanddocument-ta-Translated.pdf(PDF when WeasyPrint is available)document-ta-ReviewTranslated.md,document-ta-ReviewTranslated.htmlanddocument-ta-ReviewTranslated.pdf(Only with--review)
All files are saved in the same directory as the source PDF.
Review Mode
This is by adding an optional flag --review
- Performs back-translation (translated text → English)
- Calculates confidence score for each chunk
- Generates a rich side-by-side comparison table
- Flags low-confidence chunks (< 40%) with ⚠
Language Codes
Run this command to see all supported languages:
python translate.py --list-languages
Examples: ta (Tamil), hi (Hindi), zh-cn (Chinese), ar (Arabic), ja (Japanese), fr (French), de (German),
es (Spanish) etc.
Development
Install with dev and other dependencies:
pip install -e ".[dev,pdf,llm]"
Tests print full translation output by default (-s -v is configured in pyproject.toml).
Thank You and Feedback
All feedback welcome!
- Author: Karthik Kumar Viswanathan
- Web : karthikkumar.org
- Email : me@karthikkumar.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pdf_translate-1.0.0.tar.gz.
File metadata
- Download URL: pdf_translate-1.0.0.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74ea69d88d38820a8e389dc8e61e984bb39c7bbb789f76346a73815b4be35cdd
|
|
| MD5 |
2efc8690795fc96eaf151bcc5b1be9d3
|
|
| BLAKE2b-256 |
48afc7b2ddbc6b5c14dedabbce351f1d4f735bc5a1eb275a37a0dfe505f420d4
|