Skip to main content

MCP server for PDF translation via pdf2zh-next with full-context LLM translation

Project description

pdf2zh-next-mcp

PyPI License

MCP server for PDF translation using pdf2zh-next as the PDF processing backend. Designed for Claude Desktop.

Instead of translating each segment independently (which loses context), this server extracts all segments at once and lets the LLM translate them together — preserving terminology consistency and context across the entire document.

Using Claude Code? Check out pdf2zh-next-skill — a lightweight skill-based approach without MCP overhead. It handles large PDFs better by leveraging Claude Code's direct file I/O and auto-continuation.

How it works

┌─────────────────────────────────────────────────┐
│  Claude Desktop                                  │
│                                                  │
│  1. extract_segments  ──→  segments + formulas   │
│  2. LLM translates all segments at once          │
│  3. assemble_translated  ──→  final PDF          │
└─────────────────────────────────────────────────┘

The LLM sees every segment before translating — so terminology stays consistent, cross-page sentences flow naturally, and formula placeholders are preserved correctly.

Prerequisites

pdf2zh-next must be installed separately:

uv tool install pdf2zh-next

Verify installation:

pdf2zh_next --version

You need uv to install both pdf2zh-next and this server.

Installation

From PyPI (recommended)

uv tool install pdf2zh-next-mcp

From GitHub

uv tool install git+https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp

From source

git clone https://github.com/JaeHyeon-KAIST/pdf2zh-next-mcp
cd pdf2zh-next-mcp
uv sync

Setup

Add to your Claude Desktop MCP config:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

If installed from PyPI or GitHub:

{
  "mcpServers": {
    "pdf-translate": {
      "command": "uvx",
      "args": ["pdf2zh-next-mcp"]
    }
  }
}

If running from source:

{
  "mcpServers": {
    "pdf-translate": {
      "command": "uv",
      "args": [
        "run",
        "--directory", "/path/to/pdf2zh-next-mcp",
        "python", "-m", "pdf2zh_next_mcp.main"
      ]
    }
  }
}

Tip: If Claude Desktop can't find uvx, use the absolute path (e.g., /opt/homebrew/bin/uvx on macOS, C:\Users\you\.local\bin\uvx.exe on Windows).

Usage

Just ask:

"Translate this PDF to Korean: /path/to/paper.pdf"

Behind the scenes:

  1. extract_segments analyzes the PDF layout and returns all text segments
  2. The LLM translates everything at once (with full context)
  3. assemble_translated injects translations and generates the final PDF

Output files:

  • *-mono.pdf — translated PDF
  • *-dual.pdf — bilingual side-by-side
  • *-glossary.json — terminology glossary

Limitations

  • Large PDFs (~30+ pages): Claude Desktop has a per-turn output token limit. For documents with many segments, the translation may fail mid-process with "response could not be fully generated". For large PDFs, use pdf2zh-next-skill with Claude Code instead.
  • MCP tool result size: Segments are paginated to stay within Claude Desktop's 25K token limit per tool response. This is handled automatically.

Troubleshooting

BabeldocError: cannot unpack non-iterable NoneType object

BabelDOC needs CMap files for font character mapping. If its automatic download times out, install them manually:

cd ~/Downloads
curl -L https://github.com/funstory-ai/BabelDOC-Assets/archive/refs/heads/main.zip -o BabelDOC-Assets.zip
unzip BabelDOC-Assets.zip
mkdir -p ~/.cache/babeldoc/cmap
cp BabelDOC-Assets-main/cmap/*.json ~/.cache/babeldoc/cmap/

This is a one-time setup. The cache path is the same on all platforms.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2zh_next_mcp-0.2.0.tar.gz (46.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2zh_next_mcp-0.2.0-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf2zh_next_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: pdf2zh_next_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 46.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pdf2zh_next_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6f1b174f692a013904e110d0a32e7c042948865a25d95f95f02301f55725db41
MD5 b42b51cef72c61086994de70bb89e381
BLAKE2b-256 e22630690e141a3bd69624fb3133e8d4bca1e1b13540af7facbe87e3dfd6131c

See more details on using hashes here.

File details

Details for the file pdf2zh_next_mcp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pdf2zh_next_mcp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pdf2zh_next_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60fb17bed2721895742c5f4462e5d5360fa3d6e775851b39547ea4c1486947ff
MD5 e8d793f1ba8a96d3f8203bc4e60cbf52
BLAKE2b-256 3d195e91c212387a945c2fe630538851d09b47c6faf9882e4f45e08778aa719a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page