MCP server for PDF translation via pdf2zh-next with full-context LLM translation
Project description
pdf2zh-next-mcp
MCP server for PDF translation using pdf2zh-next as the PDF processing backend.
Instead of translating each segment independently (which loses context), this server extracts all segments at once and lets the LLM translate them together — preserving terminology consistency and context across the entire document.
How it works
┌─────────────────────────────────────────────────┐
│ MCP Client (Claude Desktop, Claude Code, etc.) │
│ │
│ 1. extract_segments ──→ segments + formulas │
│ 2. LLM translates all segments at once │
│ 3. assemble_translated ──→ final PDF │
└─────────────────────────────────────────────────┘
The LLM sees every segment before translating — so terminology stays consistent, cross-page sentences flow naturally, and formula placeholders are preserved correctly.
Prerequisites
pdf2zh-next must be installed separately:
uv tool install pdf2zh-next
You need uv to install both pdf2zh-next and this server.
Installation
From PyPI (recommended)
uv tool install pdf2zh-next-mcp
From GitHub
uv tool install git+https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
From source
git clone https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
cd pdf2zh-next-mcp
uv sync
Setup
Claude Code
claude mcp add pdf-translate -- pdf2zh-next-mcp
Claude Desktop
Add to your Claude Desktop MCP config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
If installed from PyPI or GitHub:
{
"mcpServers": {
"pdf-translate": {
"command": "pdf2zh-next-mcp"
}
}
}
If running from source:
{
"mcpServers": {
"pdf-translate": {
"command": "uv",
"args": [
"run",
"--directory", "/path/to/pdf2zh-next-mcp",
"python", "-m", "pdf2zh_next_mcp.main"
]
}
}
}
Tip: If Claude Desktop can't find
uvorpdf2zh-next-mcp, use the absolute path (e.g.,/opt/homebrew/bin/uvon macOS,C:\Users\you\.local\bin\uv.exeon Windows).
Usage
Basic mode (default)
Text-only translation. The LLM reads all segments first, identifies key terms, then translates with consistent terminology.
Just ask:
"Translate this PDF to Korean: /path/to/paper.pdf"
Behind the scenes:
extract_segmentsanalyzes the PDF layout and returns all text segments- The LLM translates everything at once (with full context)
assemble_translatedinjects translations and generates the final PDF
Visual mode
Uses the attached PDF for visual context — the LLM can see figures, tables, and formulas. Also saves a terminology glossary.
- Attach the PDF to the chat (drag-and-drop)
- Ask: "Translate this PDF to Korean in visual mode"
- The LLM creates a glossary, references the visuals, and translates all segments
Visual mode outputs:
*-mono.pdf— translated PDF*-dual.pdf— bilingual side-by-side*-glossary.json— terminology glossary
Troubleshooting
BabeldocError: cannot unpack non-iterable NoneType object
BabelDOC needs CMap files for font character mapping. If its automatic download times out, install them manually:
cd ~/Downloads
curl -L https://github.com/funstory-ai/BabelDOC-Assets/archive/refs/heads/main.zip -o BabelDOC-Assets.zip
unzip BabelDOC-Assets.zip
mkdir -p ~/.cache/babeldoc/cmap
cp BabelDOC-Assets-main/cmap/*.json ~/.cache/babeldoc/cmap/
This is a one-time setup. The cache path is the same on all platforms.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2zh_next_mcp-0.1.0.tar.gz.
File metadata
- Download URL: pdf2zh_next_mcp-0.1.0.tar.gz
- Upload date:
- Size: 40.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
585f6968cf7f1c6f3e7ed3fc17325ef804f020c29a7eaae99b367c50043f7a97
|
|
| MD5 |
5583e5c57e1d63ccd30bb09053258684
|
|
| BLAKE2b-256 |
6eccbc341906a193fb77d6e5e9eae33055a3bbdb05ff686102d8e7b00ebc5e0c
|
File details
Details for the file pdf2zh_next_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdf2zh_next_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b187306887d22ed1d544baca8c1f47479a10e63c5fe71b6d602457587c2380a0
|
|
| MD5 |
d3cbc5468331b2274f6ae7e1007f42cb
|
|
| BLAKE2b-256 |
8aa377cdd9407c10e3553c0c02a7580bcb021bff5a0495deec176dcd68246773
|