Skip to main content

Translates arXiv LaTeX papers to Chinese using Google Gemini models.

Project description

arXiv Translator Icon

arXiv LaTeX Translator

PyPI version Python 3.11+ License CI Downloads

arXiv LaTeX Translator is a powerful tool configured to automatically translate arXiv papers from English to Chinese. It downloads the LaTeX source, translates the content using Google Gemini 3.0 (Flash or Pro), and recompiles the paper into a professional PDF, preserving the original layout, equations, and citations.

✨ Features

  • Automated Workflow: Downloads source -> Extracts -> Translates -> Recompiles.
  • Model Selection: Choose between Gemini 3.0 Flash (fast/cheap) or Gemini 3.0 Pro (higher quality).
  • Academic Quality: Uses specialized prompts to ensure accurate translation of AI/ML terminology and academic tone.
  • Robust Processing:
    • Handles Large Files: Automatically chunks large LaTeX files to avoid API limits.
    • Error Resilience: Retries on network failures.
    • LaTeX Preservation: Strictly preserves mathematical formulas, citations, and structural commands.
  • Automated Workflow: Downloads source -> Extracts -> Translates -> Recompiles.
  • Model Selection: Choose between Gemini 3.0 Flash (fast/cheap) or Gemini 3.0 Pro (higher quality).
  • Academic Quality: Uses specialized prompts to ensure accurate translation of AI/ML terminology and academic tone.
  • Robust Processing:
    • Handles Large Files: Automatically chunks large LaTeX files to avoid API limits.
    • Error Resilience: Retries on network failures.
    • LaTeX Preservation: Strictly preserves mathematical formulas, citations, and structural commands.
  • DeepDive Analysis: AI-powered technical analysis that injects explanation boxes into the PDF for complex formulas and concepts.
  • Concurrent Processing: Uses parallel workers (12 processes) for both translation and analysis to significantly reduce wait times.
  • Chinese Support: Automatically injects ctex package for proper Chinese rendering.

🚀 Installation

Option 1: Install via PyPI (Recommended)

pip install arxiv-translator

Option 2: Install from Source (For Development)

git clone https://github.com/ZeyuChen/arxiv-translator.git
cd arxiv-translator
pip install .

Option 2: Development Setup (Recommended for Contributors)

git clone https://github.com/ZeyuChen/arxiv-translator.git
cd arxiv-translator

# Install micromamba environment (optional but recommended)
micromamba create -f environment.yml
micromamba activate arxiv-translator

# Install package in editable mode with dev dependencies
pip install -e .

3. Install Tectonic (TeX Engine)

The translator uses Tectonic for robust PDF compilation.

# If using micromamba environment (already installed via environment.yml)
micromamba install tectonic -c conda-forge

# Or install manually
curl --proto '=https' --tlsv1.2 -fsSL https://drop-sh.fullyjustified.net | sh

4. Get Gemini API Key

  1. Visit Google AI Studio.
  2. Sign in with your Google account.
  3. Click "Get API key" in the sidebar.
  4. Click "Create API key" (you can create it in a new or existing Google Cloud project).
  5. Copy the key string (starts with AIza...).

5. Configuration

Quick Setup (v0.2+): Run the following command to save your API key globally (stored in ~/.arxiv-translator/config.json):

arxiv-translator --set-key YOUR_API_KEY

Alternative: Set the environment variable:

export GEMINI_API_KEY=your_api_key_here

📖 Usage

Basic Usage

arxiv-translator https://arxiv.org/abs/2602.04705

Advanced Usage

Select Model:

# Use Gemini 3.0 Pro (Better quality, slower)
arxiv-translator 2602.04705 --model pro

# Use Gemini 3.0 Flash (Default, faster)
arxiv-translator 2602.04705 --model flash

# Enable DeepDive Analysis (Technical Explanations)
arxiv-translator 2602.04705 --deepdive

Custom Output:

arxiv-translator 2602.04705 --output my_translated_paper.pdf

Full Help:

arxiv-translator --help

📂 Output

The translated PDF will be generated in the project root with the format:

  • {arxiv_id}_zh_flash.pdf (for Flash model)
  • {arxiv_id}_zh_pro.pdf (for Pro model)

🔧 Technical Details

  • Parser: Extracts the main LaTeX file automatically.
  • Translator: Uses google-genai SDK. Implements smart chunking for long sections.
  • Compiler: Uses tectonic for hassle-free compilation, automatically downloading necessary LaTeX packages.

🤝 Contributing

Contributions are welcome! Please submit a Pull Request.

📊 Performance Benchmark

Test Case: arXiv:2602.04705 (ERNIE 5.0 Technical Report)

Model Time Translation Quality Note
Gemini 3.0 Flash ~11 min Good Fast, reliable. Recommended for most papers.
Gemini 3.0 Pro ~20 min Excellent Slower, higher precision in detailed academic phrasing.

Comparison Preview

Original English Gemini 3.0 Flash Gemini 3.0 Pro

📄 License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_translator-0.4.0.tar.gz (828.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arxiv_translator-0.4.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file arxiv_translator-0.4.0.tar.gz.

File metadata

  • Download URL: arxiv_translator-0.4.0.tar.gz
  • Upload date:
  • Size: 828.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for arxiv_translator-0.4.0.tar.gz
Algorithm Hash digest
SHA256 18d9f27bfaabf9c804a3dcdf1ef7da3f8d4c1f6cdc47f0436f69d34123d21676
MD5 fef21d403be831e2e165d9fefcf15ac9
BLAKE2b-256 c38635c901c61cc676dac1eb0c05e82b73445849ffd6cf265e9ad772771a578f

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxiv_translator-0.4.0.tar.gz:

Publisher: publish.yml on ZeyuChen/arxiv-translator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arxiv_translator-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arxiv_translator-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5f4d0afaa215e7ba7a748efd4aea43ce9dd3c6898aa27f935a0f345ed4cc5145
MD5 ce543ce9eddb172ee881a535464b9851
BLAKE2b-256 89c10320345263fe7785a69290ae4134abc8e908be6386afa15169dff80c2670

See more details on using hashes here.

Provenance

The following attestation bundles were made for arxiv_translator-0.4.0-py3-none-any.whl:

Publisher: publish.yml on ZeyuChen/arxiv-translator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page