Skip to main content

Latex PDF Translator

Project description

English | 简体中文

PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

  • 📊 Retain formulas and charts.

  • 📄 Preserve table of contents.

  • 🌐 Support multiple translation services.

Feel free to provide feedback in issues or user group.

Installation

Require Python version >=3.8, <=3.12

pip install pdf2zh

Usage

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory. Use Google as the default translation service.

Please refer to ChatGPT for how to set environment variables.

Full / partial document translation

  • Entire document
pdf2zh example.pdf
  • Part of the document
pdf2zh example.pdf -p 1-3,5

Specify source and target languages

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

Translate with Different Services

  • DeepL

    See DeepL

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPL_SERVER_URL (Optional), e.g., export DEEPL_SERVER_URL=https://api.deepl.com
    • DEEPL_AUTH_KEY, e.g., export DEEPL_AUTH_KEY=xxx
    pdf2zh example.pdf -s deepl
    
  • DeepLX

    See DeepLX

    Set ENVs to construct an endpoint like: {DEEPL_SERVER_URL}/translate

    • DEEPLX_SERVER_URL (Optional), e.g., export DEEPLX_SERVER_URL=https://api.deeplx.org
    • DEEPLX_AUTH_KEY, e.g., export DEEPLX_AUTH_KEY=xxx
    pdf2zh example.pdf -s deeplx
    
  • Ollama

    See Ollama

    Set ENVs to construct an endpoint like: {OLLAMA_HOST}/api/chat

    • OLLAMA_HOST (Optional), e.g., export OLLAMA_HOST=https://localhost:11434
    pdf2zh example.pdf -s ollama:gemma2
    
  • LLM with OpenAI compatible schemas (OpenAI / SiliconCloud / Zhipu)

    See SiliconCloud, Zhipu

    Set ENVs to construct an endpoint like: {OPENAI_BASE_URL}/chat/completions

    • OPENAI_BASE_URL (Optional), e.g., export OPENAI_BASE_URL=https://api.openai.com/v1
    • OPENAI_API_KEY, e.g., export OPENAI_API_KEY=xxx
    pdf2zh example.pdf -s openai:gpt-4o
    
  • Azure

    See Azure Text Translation

    Following ENVs are required:

    • AZURE_APIKEY, e.g., export AZURE_APIKEY=xxx
    • AZURE_ENDPOINT, e.g, export AZURE_ENDPOINT=https://api.translator.azure.cn/
    • AZURE_REGION, e.g., export AZURE_REGION=chinaeast2
    pdf2zh example.pdf -s azure
    

Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved.

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Interact with GUI

pdf2zh -i

See documentation for GUI for more details.

Preview

image

image

image

Acknowledgement

Document merging: PyMuPDF

Document parsing: Pdfminer.six

Document extraction: MinerU

Multi-threaded translation: MathTranslate

Layout parsing: DocLayout-YOLO

Document standard: PDF Explained, PDF Cheat Sheets

Contributors

Star History

Star History Chart

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2zh-1.7.3.tar.gz (149.1 kB view details)

Uploaded Source

Built Distribution

pdf2zh-1.7.3-py3-none-any.whl (156.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf2zh-1.7.3.tar.gz.

File metadata

  • Download URL: pdf2zh-1.7.3.tar.gz
  • Upload date:
  • Size: 149.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pdf2zh-1.7.3.tar.gz
Algorithm Hash digest
SHA256 c9f5876cf339285a21b13612b53af2a5c99a0e852bb9ea62581d1ef4a328f671
MD5 e1afa0a3e537b5af386c3021bab8fd1a
BLAKE2b-256 5678a90395fb2c9954dc7d281240dba0fd95a2e0033eba1d558253c90f55b14e

See more details on using hashes here.

File details

Details for the file pdf2zh-1.7.3-py3-none-any.whl.

File metadata

  • Download URL: pdf2zh-1.7.3-py3-none-any.whl
  • Upload date:
  • Size: 156.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pdf2zh-1.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0d3d9b8ecab26aebd8186d58e7261250401b1a04c97adfbc51981bdc990cf57a
MD5 ef80de5cc008ba40299cf6921b7e68b2
BLAKE2b-256 61fd2db262bc4bc6dac3680e0121900639cd029509b02768a3c355faf07141b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page