Translates arXiv LaTeX papers to Chinese using Google Gemini models.
Project description
arXiv LaTeX Translator
arXiv LaTeX Translator is a powerful tool configured to automatically translate arXiv papers from English to Chinese. It downloads the LaTeX source, translates the content using Google Gemini 3.0 (Flash or Pro), and recompiles the paper into a professional PDF, preserving the original layout, equations, and citations.
✨ Features
- Automated Workflow: Downloads source -> Extracts -> Translates -> Recompiles.
- Model Selection: Choose between Gemini 3.0 Flash (fast/cheap) or Gemini 3.0 Pro (higher quality).
- Academic Quality: Uses specialized prompts to ensure accurate translation of AI/ML terminology and academic tone.
- Robust Processing:
- Handles Large Files: Automatically chunks large LaTeX files to avoid API limits.
- Error Resilience: Retries on network failures.
- LaTeX Preservation: Strictly preserves mathematical formulas, citations, and structural commands.
- Automated Workflow: Downloads source -> Extracts -> Translates -> Recompiles.
- Model Selection: Choose between Gemini 3.0 Flash (fast/cheap) or Gemini 3.0 Pro (higher quality).
- Academic Quality: Uses specialized prompts to ensure accurate translation of AI/ML terminology and academic tone.
- Robust Processing:
- Handles Large Files: Automatically chunks large LaTeX files to avoid API limits.
- Error Resilience: Retries on network failures.
- LaTeX Preservation: Strictly preserves mathematical formulas, citations, and structural commands.
- DeepDive Analysis: AI-powered technical analysis that injects explanation boxes into the PDF for complex formulas and concepts.
- Concurrent Processing: Uses parallel workers (12 processes) for both translation and analysis to significantly reduce wait times.
- Chinese Support: Automatically injects
ctexpackage for proper Chinese rendering.
🚀 Installation
Option 1: Install via PyPI (Recommended)
pip install arxiv-translator
Option 2: Install from Source (For Development)
git clone https://github.com/ZeyuChen/arxiv-translator.git
cd arxiv-translator
pip install .
Option 2: Development Setup (Recommended for Contributors)
git clone https://github.com/ZeyuChen/arxiv-translator.git
cd arxiv-translator
# Install micromamba environment (optional but recommended)
micromamba create -f environment.yml
micromamba activate arxiv-translator
# Install package in editable mode with dev dependencies
pip install -e .
3. Install Tectonic (TeX Engine)
The translator uses Tectonic for robust PDF compilation.
# If using micromamba environment (already installed via environment.yml)
micromamba install tectonic -c conda-forge
# Or install manually
curl --proto '=https' --tlsv1.2 -fsSL https://drop-sh.fullyjustified.net | sh
4. Get Gemini API Key
- Visit Google AI Studio.
- Sign in with your Google account.
- Click "Get API key" in the sidebar.
- Click "Create API key" (you can create it in a new or existing Google Cloud project).
- Copy the key string (starts with
AIza...).
5. Configuration
Quick Setup (v0.2+):
Run the following command to save your API key globally (stored in ~/.arxiv-translator/config.json):
arxiv-translator --set-key YOUR_API_KEY
Alternative: Set the environment variable:
export GEMINI_API_KEY=your_api_key_here
📖 Usage
Basic Usage
arxiv-translator https://arxiv.org/abs/2602.04705
Advanced Usage
Select Model:
# Use Gemini 3.0 Pro (Better quality, slower)
arxiv-translator 2602.04705 --model pro
# Use Gemini 3.0 Flash (Default, faster)
arxiv-translator 2602.04705 --model flash
# Enable DeepDive Analysis (Technical Explanations)
arxiv-translator 2602.04705 --deepdive
Custom Output:
arxiv-translator 2602.04705 --output my_translated_paper.pdf
Full Help:
arxiv-translator --help
📂 Output
The translated PDF will be generated in the project root with the format:
{arxiv_id}_zh_flash.pdf(for Flash model){arxiv_id}_zh_pro.pdf(for Pro model)
🔧 Technical Details
- Parser: Extracts the main LaTeX file automatically.
- Translator: Uses
google-genaiSDK. Implements smart chunking for long sections. - Compiler: Uses
tectonicfor hassle-free compilation, automatically downloading necessary LaTeX packages.
🤝 Contributing
Contributions are welcome! Please submit a Pull Request.
📊 Performance Benchmark
Test Case: arXiv:2602.04705 (ERNIE 5.0 Technical Report)
| Model | Time | Translation Quality | Note |
|---|---|---|---|
| Gemini 3.0 Flash | ~11 min | Good | Fast, reliable. Recommended for most papers. |
| Gemini 3.0 Pro | ~20 min | Excellent | Slower, higher precision in detailed academic phrasing. |
Comparison Preview
| Original English | Gemini 3.0 Flash | Gemini 3.0 Pro |
|---|---|---|
📄 License
This project is licensed under the Apache License 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arxiv_translator-0.4.0.tar.gz.
File metadata
- Download URL: arxiv_translator-0.4.0.tar.gz
- Upload date:
- Size: 828.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18d9f27bfaabf9c804a3dcdf1ef7da3f8d4c1f6cdc47f0436f69d34123d21676
|
|
| MD5 |
fef21d403be831e2e165d9fefcf15ac9
|
|
| BLAKE2b-256 |
c38635c901c61cc676dac1eb0c05e82b73445849ffd6cf265e9ad772771a578f
|
Provenance
The following attestation bundles were made for arxiv_translator-0.4.0.tar.gz:
Publisher:
publish.yml on ZeyuChen/arxiv-translator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arxiv_translator-0.4.0.tar.gz -
Subject digest:
18d9f27bfaabf9c804a3dcdf1ef7da3f8d4c1f6cdc47f0436f69d34123d21676 - Sigstore transparency entry: 927043895
- Sigstore integration time:
-
Permalink:
ZeyuChen/arxiv-translator@84e93f0fd3a9b7a9e34c0175e8afeedfb45f0630 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ZeyuChen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84e93f0fd3a9b7a9e34c0175e8afeedfb45f0630 -
Trigger Event:
push
-
Statement type:
File details
Details for the file arxiv_translator-0.4.0-py3-none-any.whl.
File metadata
- Download URL: arxiv_translator-0.4.0-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f4d0afaa215e7ba7a748efd4aea43ce9dd3c6898aa27f935a0f345ed4cc5145
|
|
| MD5 |
ce543ce9eddb172ee881a535464b9851
|
|
| BLAKE2b-256 |
89c10320345263fe7785a69290ae4134abc8e908be6386afa15169dff80c2670
|
Provenance
The following attestation bundles were made for arxiv_translator-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on ZeyuChen/arxiv-translator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arxiv_translator-0.4.0-py3-none-any.whl -
Subject digest:
5f4d0afaa215e7ba7a748efd4aea43ce9dd3c6898aa27f935a0f345ed4cc5145 - Sigstore transparency entry: 927043896
- Sigstore integration time:
-
Permalink:
ZeyuChen/arxiv-translator@84e93f0fd3a9b7a9e34c0175e8afeedfb45f0630 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ZeyuChen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84e93f0fd3a9b7a9e34c0175e8afeedfb45f0630 -
Trigger Event:
push
-
Statement type: