Skip to main content

MCP server for PDF translation via pdf2zh-next with full-context LLM translation

Project description

pdf2zh-next-mcp

PyPI License

MCP server for PDF translation using pdf2zh-next as the PDF processing backend. Designed for Claude Desktop.

Instead of translating each segment independently (which loses context), this server extracts all segments at once and lets the LLM translate them together — preserving terminology consistency and context across the entire document.

Using Claude Code? Check out pdf2zh-next-skill — a lightweight skill-based approach without MCP overhead. It handles large PDFs better by leveraging Claude Code's direct file I/O and auto-continuation.

How it works

┌─────────────────────────────────────────────────┐
│  Claude Desktop                                  │
│                                                  │
│  1. extract_segments  ──→  segments + formulas   │
│  2. LLM translates all segments at once          │
│  3. assemble_translated  ──→  final PDF          │
└─────────────────────────────────────────────────┘

The LLM sees every segment before translating — so terminology stays consistent, cross-page sentences flow naturally, and formula placeholders are preserved correctly.

Prerequisites

pdf2zh-next must be installed separately:

uv tool install pdf2zh-next

Verify installation:

pdf2zh_next --version

You need uv to install both pdf2zh-next and this server.

Installation

From PyPI (recommended)

uv tool install pdf2zh-next-mcp

From GitHub

uv tool install git+https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp

From source

git clone https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
cd pdf2zh-next-mcp
uv sync

Setup

Add to your Claude Desktop MCP config:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

If installed from PyPI or GitHub:

{
  "mcpServers": {
    "pdf-translate": {
      "command": "uvx",
      "args": ["pdf2zh-next-mcp"]
    }
  }
}

If running from source:

{
  "mcpServers": {
    "pdf-translate": {
      "command": "uv",
      "args": [
        "run",
        "--directory", "/path/to/pdf2zh-next-mcp",
        "python", "-m", "pdf2zh_next_mcp.main"
      ]
    }
  }
}

Tip: If Claude Desktop can't find uvx, use the absolute path (e.g., /opt/homebrew/bin/uvx on macOS, C:\Users\you\.local\bin\uvx.exe on Windows).

Usage

Just ask:

"Translate this PDF to Korean: /path/to/paper.pdf"

Behind the scenes:

  1. extract_segments analyzes the PDF layout and returns all text segments
  2. The LLM translates everything at once (with full context)
  3. assemble_translated injects translations and generates the final PDF

Output files:

  • *-mono.pdf — translated PDF
  • *-dual.pdf — bilingual side-by-side
  • *-glossary.json — terminology glossary

Limitations

  • Large PDFs (~30+ pages): Claude Desktop has a per-turn output token limit. For documents with many segments, the translation may fail mid-process with "response could not be fully generated". For large PDFs, use pdf2zh-next-skill with Claude Code instead.
  • MCP tool result size: Segments are paginated to stay within Claude Desktop's 25K token limit per tool response. This is handled automatically.

Troubleshooting

BabeldocError: cannot unpack non-iterable NoneType object

BabelDOC needs CMap files for font character mapping. If its automatic download times out, install them manually:

cd ~/Downloads
curl -L https://github.com/funstory-ai/BabelDOC-Assets/archive/refs/heads/main.zip -o BabelDOC-Assets.zip
unzip BabelDOC-Assets.zip
mkdir -p ~/.cache/babeldoc/cmap
cp BabelDOC-Assets-main/cmap/*.json ~/.cache/babeldoc/cmap/

This is a one-time setup. The cache path is the same on all platforms.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2zh_next_mcp-0.2.1.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2zh_next_mcp-0.2.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file pdf2zh_next_mcp-0.2.1.tar.gz.

File metadata

  • Download URL: pdf2zh_next_mcp-0.2.1.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pdf2zh_next_mcp-0.2.1.tar.gz
Algorithm Hash digest
SHA256 45b4f35e794d523bcbc1f7d2cf86e376d7bc5652211758bfd7659f1e06b689ae
MD5 9c2eedef21b3f7b4003db2a5caf7b3c1
BLAKE2b-256 cee5ca86f602ac593f52cedb2b6d07bbacf1ec0a5b6b712fc6aa6f7e4df251e2

See more details on using hashes here.

File details

Details for the file pdf2zh_next_mcp-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pdf2zh_next_mcp-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pdf2zh_next_mcp-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27129e4dea51994a38abf176f7073e87363e29dc12ec72b0e380acf61e823bc7
MD5 2625b654d1c2323db8ea3591b5ea95ab
BLAKE2b-256 bfd80719e247b59453cc804a31e8c9f4f9bfaf10b7337ced6dddfb1278af96b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page