Skip to main content

Latex PDF Translator

Project description

English | 简体中文

PDF2ZH

PDFMathTranslate

PDF scientific paper translation and bilingual comparison.

Feel free to provide feedback in GitHub Issues, Telegram Group or QQ Group.

Updates

  • [Nov. 26 2024] CLI now supports online file(s) (by @reycn)
  • [Nov. 24 2024] ONNX support to reduce dependency sizes (by @Wybxc)
  • [Nov. 23 2024] 🌟 Public Service online! (by @Byaidu)
  • [Nov. 23 2024] Firewall for preventing web bots (by @Byaidu)
  • [Nov. 22 2024] GUI now supports Italian, and has been improved (by @Byaidu, @reycn)
  • [Nov. 22 2024] You can now share your deployed service to others (by @Zxis233)
  • [Nov. 22 2024] Now supports Tencent Translation (by @hellofinch)
  • [Nov. 21 2024] GUI now supports downloading dual-document (by @reycn)
  • [Nov. 20 2024] 🌟 Demo online! (by @reycn)

Preview

Public Service 🌟

Free Service (https://pdf2zh.com/)

You can try our public service online without installation.

Hugging Face Demo

You can try our demo on HuggingFace without installation. Note that the computing resources of the demo are limited, so please avoid abusing them.

Installation and Usage

We provide three methods for using this project: Commandline, Portable, GUI, and Docker.

Method I. Commandline

  1. Python installed (3.8 <= version <= 3.12)

  2. Install our package:

    pip install pdf2zh
    
  3. Execute translation, files generated in current working directory:

    pdf2zh document.pdf
    

Method II. Portable

No need to pre-install Python environment

Download and double-click to run setup.bat

Method III. GUI

  1. Python installed (3.8 <= version <= 3.12)

  2. Install our package:

    pip install pdf2zh
    
  3. Start using in browser:

    pdf2zh -i
    
  4. If your browswer has not been started automatically, goto

    http://localhost:7860/
    

See documentation for GUI for more details.

Method IV. Docker

  1. Pull and run:

    docker pull byaidu/pdf2zh
    docker run -d -p 7860:7860 byaidu/pdf2zh
    
  2. Open in browser:

    http://localhost:7860/
    

For docker deployment on cloud service:

Advanced Options

Execute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current working directory. Use Google as the default translation service.

cmd

In the following table, we list all advanced options for reference:

Option Function Example
files Local files pdf2zh ~/local.pdf
links Online files pdf2zh http://arxiv.org/paper.pdf
-i Enter GUI pdf2zh -i
-p Partial document translation pdf2zh example.pdf -p 1
-li Source language pdf2zh example.pdf -li en
-lo Target language pdf2zh example.pdf -lo zh
-s Translation service pdf2zh example.pdf -s deepl
-t Multi-threads pdf2zh example.pdf -t 1
-o Output dir pdf2zh example.pdf -o output
-f, -c Exceptions pdf2zh example.pdf -f "(MS.*)"

Some services require setting environmental variables.

Full / partial document translation

  • Entire document

    pdf2zh example.pdf
    
  • Part of the document

    pdf2zh example.pdf -p 1-3,5
    

Specify source and target languages

See Google Languages Codes, DeepL Languages Codes

pdf2zh example.pdf -li en -lo ja

Translate with Different Services

The table below outlines the required environment variables for each translation service. Make sure to set them before using the respective service.

Translator Service Environment Variables Default Values Notes
Google (Default) google None N/A None
Bing bing None N/A None
DeepL deepl DEEPL_SERVER_URL,DEEPL_AUTH_KEY https://api.deepl.com, [Your Key] See DeepL
DeepLX deeplx DEEPLX_ENDPOINT https://api.deepl.com/translate See DeepLX
Ollama ollama OLLAMA_HOST, OLLAMA_MODEL http://127.0.0.1:11434, gemma2 See Ollama
OpenAI openai OPENAI_BASE_URL, OPENAI_API_KEY, OPENAI_MODEL https://api.openai.com/v1, [Your Key], gpt-4o-mini See OpenAI
Zhipu zhipu ZHIPU_API_KEY, ZHIPU_MODEL [Your Key], glm-4-flash See Zhipu
Silicon silicon SILICON_API_KEY, SILICON_MODEL [Your Key], Qwen/Qwen2.5-7B-Instruct See SiliconCloud
Azure azure AZURE_ENDPOINT, AZURE_API_KEY https://api.translator.azure.cn, [Your Key] See Azure
Tencent tencent TENCENTCLOUD_SECRET_ID, TENCENTCLOUD_SECRET_KEY [Your ID], [Your Key] See Tencent

Use -s service or -s service:model to specify service:

pdf2zh example.pdf -s openai:gpt-4o-mini

Or specify model with environment variables:

set OPENAI_MODEL=gpt-4o-mini
pdf2zh example.pdf -s openai

Translate wih exceptions

Use regex to specify formula fonts and characters that need to be preserved:

pdf2zh example.pdf -f "(CM[^RT].*|MS.*|.*Ital)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"

Specify threads

Use -t to specify how many threads to use in translation:

pdf2zh example.pdf -t 1

TODO

  • Parse layout with DocLayNet based models, PaddleX, PaperMage, SAM2

  • Fix page rotation, table of contents, format of lists

  • Fix pixel formula in old papers

  • Async retry except KeyboardInterrupt

  • Knuth–Plass algorithm for western languages

  • Support non-PDF/A files

Acknowledgements

Contributors

Alt

Star History

Star History Chart

Project details


Release history Release notifications | RSS feed

This version

1.8.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2zh-1.8.5.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2zh-1.8.5-py3-none-any.whl (43.1 kB view details)

Uploaded Python 3

File details

Details for the file pdf2zh-1.8.5.tar.gz.

File metadata

  • Download URL: pdf2zh-1.8.5.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for pdf2zh-1.8.5.tar.gz
Algorithm Hash digest
SHA256 4360fbd13a91020b7a184e3237e7bfba7fc6b38f2cb03e946208a77c913ff0be
MD5 348874803112b29343b1d1192448d783
BLAKE2b-256 341428fb04d4b9c3bd83f443c13f8b91ff20e2a0afe36d62ef16278b838d618d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2zh-1.8.5.tar.gz:

Publisher: python-publish.yml on Byaidu/PDFMathTranslate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdf2zh-1.8.5-py3-none-any.whl.

File metadata

  • Download URL: pdf2zh-1.8.5-py3-none-any.whl
  • Upload date:
  • Size: 43.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for pdf2zh-1.8.5-py3-none-any.whl
Algorithm Hash digest
SHA256 32a510407481a9ebc42550b7697c72d29162b4b02a9169df6416e3aaaf47f8f9
MD5 d5363f2261cf7320b74b62a7c39e81b7
BLAKE2b-256 9f90e525c1bfac9abeea42c24bf5d46823fecfba0ef73d2417218ee20e2be902

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2zh-1.8.5-py3-none-any.whl:

Publisher: python-publish.yml on Byaidu/PDFMathTranslate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page