Skip to main content

Latex PDF Translator

Project description

English | 简体中文

PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

  • 📊 Retain formulas and charts.

  • 📄 Preserve table of contents.

  • 🌐 Support multiple translation services.

Feel free to provide feedback in issues or user group.

Installation

Require Python version >=3.8, <=3.12

pip install pdf2zh

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.

Please refer to ChatGPT for how to set environment variables.

Full / partial document translation

  • Entire document
pdf2zh example.pdf
  • Part of the document
pdf2zh example.pdf -p 1-3,5

Specify source and target languages

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

Translate with Different Services

  • DeepL

    See DeepL

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPL_SERVER_URL (Optional), e.g., export DEEPL_SERVER_URL=https://api.deepl.com
    • DEEPL_AUTH_KEY, e.g., export DEEPL_AUTH_KEY=xxx
    pdf2zh example.pdf -s deepl
    
  • DeepLX

    See DeepLX

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPLX_SERVER_URL (Optional), e.g., export DEEPLX_SERVER_URL=https://api.deeplx.org
    • DEEPLX_AUTH_KEY, e.g., export DEEPLX_AUTH_KEY=xxx
    pdf2zh example.pdf -s deeplx
    
  • Ollama

    See Ollama

    Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat

    • OLLAMA_HOST (Optional), e.g., export OLLAMA_HOST=https://localhost:11434
    pdf2zh example.pdf -s ollama:gemma2
    
  • LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)

    See SiliconCloud, Zhipu

    Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions

    • OPENAI_BASE_URL (Optional), e.g., export OPENAI_BASE_URL=https://api.openai.com/v1
    • OPENAI_API_KEY, e.g., export OPENAI_API_KEY=xxx
    pdf2zh example.pdf -s openai:gpt-4o
    
  • Azure

    See Azure Text Translation

    Following ENVs are required:

    • AZURE_APIKEY, e.g., export AZURE_APIKEY=xxx
    • AZURE_ENDPOINT, e.g, export AZURE_ENDPOINT=https://api.translator.azure.cn/
    • AZURE_REGION, e.g., export AZURE_REGION=chinaeast2
    pdf2zh example.pdf -s azure
    

Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved.

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Interact with GUI

pdf2zh -i

See documentation for GUI for more details.

Preview

image

image

image

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Document standard: PDF Explained, PDF Cheat Sheets

Contributors

Star History

Star History Chart

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2zh-1.7.4.tar.gz (149.1 kB view details)

Uploaded Source

Built Distribution

pdf2zh-1.7.4-py3-none-any.whl (156.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf2zh-1.7.4.tar.gz.

File metadata

  • Download URL: pdf2zh-1.7.4.tar.gz
  • Upload date:
  • Size: 149.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pdf2zh-1.7.4.tar.gz
Algorithm Hash digest
SHA256 c15dddca58a73412e719286eaba65d2e74ec6c13127f137e5e4836efc332727c
MD5 9020866c24b242fa88ecbe6109ac3126
BLAKE2b-256 2ee420434f38facf47bdfdb6da5aeb37ec160def6aae5ff50aa4d9e12c801e65

See more details on using hashes here.

File details

Details for the file pdf2zh-1.7.4-py3-none-any.whl.

File metadata

  • Download URL: pdf2zh-1.7.4-py3-none-any.whl
  • Upload date:
  • Size: 156.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pdf2zh-1.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5b24ab48ed69a6a140dc10c45494feb9880e1483e3a9d9a863afb6c91a86c85c
MD5 5eade4fa6d82d8701650456cbb854de6
BLAKE2b-256 496816ad49eb70aed49e9c7ac252f579ac740e3cdd19c3bfecf0ba44b500383f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page