MCP server for PDF translation via pdf2zh-next with full-context LLM translation
Project description
pdf2zh-next-mcp
MCP server for PDF translation using pdf2zh-next as the PDF processing backend. Designed for Claude Desktop.
Instead of translating each segment independently (which loses context), this server extracts all segments at once and lets the LLM translate them together — preserving terminology consistency and context across the entire document.
Using Claude Code? Check out pdf2zh-next-skill — a lightweight skill-based approach without MCP overhead. It handles large PDFs better by leveraging Claude Code's direct file I/O and auto-continuation.
How it works
┌─────────────────────────────────────────────────┐
│ Claude Desktop │
│ │
│ 1. extract_segments ──→ segments + formulas │
│ 2. LLM translates all segments at once │
│ 3. assemble_translated ──→ final PDF │
└─────────────────────────────────────────────────┘
The LLM sees every segment before translating — so terminology stays consistent, cross-page sentences flow naturally, and formula placeholders are preserved correctly.
Prerequisites
pdf2zh-next must be installed separately:
uv tool install pdf2zh-next
Verify installation:
pdf2zh_next --version
You need uv to install both pdf2zh-next and this server.
Installation
From PyPI (recommended)
uv tool install pdf2zh-next-mcp
From GitHub
uv tool install git+https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
From source
git clone https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
cd pdf2zh-next-mcp
uv sync
Setup
Add to your Claude Desktop MCP config:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
If installed from PyPI or GitHub:
{
"mcpServers": {
"pdf-translate": {
"command": "uvx",
"args": ["pdf2zh-next-mcp"]
}
}
}
If running from source:
{
"mcpServers": {
"pdf-translate": {
"command": "uv",
"args": [
"run",
"--directory", "/path/to/pdf2zh-next-mcp",
"python", "-m", "pdf2zh_next_mcp.main"
]
}
}
}
Tip: If Claude Desktop can't find
uvx, use the absolute path (e.g.,/opt/homebrew/bin/uvxon macOS,C:\Users\you\.local\bin\uvx.exeon Windows).
Usage
Just ask:
"Translate this PDF to Korean: /path/to/paper.pdf"
Behind the scenes:
extract_segmentsanalyzes the PDF layout and returns all text segments- The LLM translates everything at once (with full context)
assemble_translatedinjects translations and generates the final PDF
Output files:
*-mono.pdf— translated PDF*-dual.pdf— bilingual side-by-side*-glossary.json— terminology glossary
Limitations
- Large PDFs (~30+ pages): Claude Desktop has a per-turn output token limit. For documents with many segments, the translation may fail mid-process with "response could not be fully generated". For large PDFs, use pdf2zh-next-skill with Claude Code instead.
- MCP tool result size: Segments are paginated to stay within Claude Desktop's 25K token limit per tool response. This is handled automatically.
Troubleshooting
BabeldocError: cannot unpack non-iterable NoneType object
BabelDOC needs CMap files for font character mapping. If its automatic download times out, install them manually:
cd ~/Downloads
curl -L https://github.com/funstory-ai/BabelDOC-Assets/archive/refs/heads/main.zip -o BabelDOC-Assets.zip
unzip BabelDOC-Assets.zip
mkdir -p ~/.cache/babeldoc/cmap
cp BabelDOC-Assets-main/cmap/*.json ~/.cache/babeldoc/cmap/
This is a one-time setup. The cache path is the same on all platforms.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2zh_next_mcp-0.2.1.tar.gz.
File metadata
- Download URL: pdf2zh_next_mcp-0.2.1.tar.gz
- Upload date:
- Size: 47.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45b4f35e794d523bcbc1f7d2cf86e376d7bc5652211758bfd7659f1e06b689ae
|
|
| MD5 |
9c2eedef21b3f7b4003db2a5caf7b3c1
|
|
| BLAKE2b-256 |
cee5ca86f602ac593f52cedb2b6d07bbacf1ec0a5b6b712fc6aa6f7e4df251e2
|
File details
Details for the file pdf2zh_next_mcp-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pdf2zh_next_mcp-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27129e4dea51994a38abf176f7073e87363e29dc12ec72b0e380acf61e823bc7
|
|
| MD5 |
2625b654d1c2323db8ea3591b5ea95ab
|
|
| BLAKE2b-256 |
bfd80719e247b59453cc804a31e8c9f4f9bfaf10b7337ced6dddfb1278af96b4
|