Skip to main content

MCP server for MinerU - PDF parsing with MLX acceleration on Apple Silicon

Project description

🚀 MCP-MinerU

A Model Context Protocol (MCP) server that brings powerful PDF parsing capabilities to Claude using MinerU.

✨ Features

  • 📄 Parse PDF files with high accuracy
  • 🧮 Extract formulas and mathematical equations
  • 📊 Recognize tables and preserve structure
  • ⚡️ MLX acceleration on Apple Silicon (M1/M2/M3/M4)
  • 🔄 Multiple backends for different use cases
  • 🤖 MCP integration for seamless use with Claude

🎯 Tools

parse_pdf

Parse PDF files and extract structured content as Markdown.

Parameters:

  • file_path (required): Absolute path to the PDF file
  • backend (optional): pipeline | vlm-mlx-engine | vlm-transformers
  • formula_enable (optional): Enable formula recognition (default: true)
  • table_enable (optional): Enable table recognition (default: true)
  • start_page (optional): Starting page number (default: 0)
  • end_page (optional): Ending page number (default: -1 for all pages)

list_backends

Check system capabilities and get backend recommendations.

🛠️ Installation

Prerequisites

  • Python 3.10-3.13
  • uv (recommended) or pip

Quick Install

# Clone the repository
git clone https://github.com/TINKPA/mcp-mineru.git
cd mcp-mineru

# Install with all dependencies (one command!)
pip install -e .

That's it! The mineru[core] dependency will automatically install all backends (pipeline, vlm, mlx).

🔧 Configuration

Claude Code (Recommended)

Use the Claude Code CLI to add the server directly:

# Replace /absolute/path/to/mcp-mineru with your actual path
# Using --scope user makes it available across all your projects
claude mcp add --transport stdio --scope user mineru -- \
  python /absolute/path/to/mcp-mineru/src/mcp_mineru/server.py

Or using uv:

claude mcp add --transport stdio --scope user mineru -- \
  uv --directory /absolute/path/to/mcp-mineru run python src/mcp_mineru/server.py

Configuration Scope Options:

  • --scope user (recommended): Available across all your projects
  • --scope local: Available only in the current project (default)
  • --scope project: Shared with everyone via .mcp.json file

Note: The -- (double dash) separates Claude's CLI flags from the command that runs the MCP server. Everything after -- is the actual command to execute.

Claude Desktop (Manual Configuration)

Add to your Claude Desktop configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "mineru": {
      "command": "python",
      "args": [
        "/absolute/path/to/mcp-mineru/src/mcp_mineru/server.py"
      ]
    }
  }
}

Or using uv (recommended):

{
  "mcpServers": {
    "mineru": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/mcp-mineru",
        "run",
        "python",
        "src/mcp_mineru/server.py"
      ]
    }
  }
}

📖 Usage Examples

Example 1: Parse a PDF

User: "Please analyze this research paper: /path/to/paper.pdf"

Claude: [Calls parse_pdf tool]
"This research paper discusses... The key findings in Table 3 show..."

Example 2: Check system capabilities

User: "What's the best backend for my system?"

Claude: [Calls list_backends tool]
"Your system has Apple Silicon (M4). I recommend using the
'vlm-mlx-engine' backend for fastest performance."

Example 3: Extract specific pages

User: "Extract pages 10-15 from this PDF"

Claude: [Calls parse_pdf with start_page=9, end_page=14]
"Here's the content from pages 10-15..."

🏗️ Development

Run tests

pytest

Format code

black src/
ruff check src/

❓ Troubleshooting

ModuleNotFoundError when running tests

If you see errors like ModuleNotFoundError: No module named 'mineru' or 'torch':

Solution: Reinstall the package to ensure all dependencies are installed:

pip install -e .

The mineru[core] dependency should automatically install all required backends.

🚀 Performance

On Apple Silicon (M4):

  • pipeline backend: ~32 seconds/page
  • vlm-mlx-engine backend: ~38 seconds/page (higher quality)
  • vlm-transformers backend: ~148 seconds/page

Benchmarked on a Mac mini M4 with 16GB RAM

📝 License

This project uses MinerU as a submodule, which is licensed under the Apache License 2.0.

🙏 Dependencies & Acknowledgments

This project is built on top of:

  • MinerU (Apache 2.0)

    • Core PDF parsing engine
    • Included as git submodule for development stability
  • MCP (MIT)

    • Model Context Protocol specification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_mineru-0.1.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_mineru-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_mineru-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_mineru-0.1.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for mcp_mineru-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eec121b193e2f9248c512caa3015551aaa908b42037e6fa31012604617a00b8f
MD5 e144f3c0c8060f61ddb4eb19ea5cf1af
BLAKE2b-256 f220cb3f2abcee0174ad544f0bb2ff3f2a29743e97b1a2841fd1d8172c11b895

See more details on using hashes here.

File details

Details for the file mcp_mineru-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mcp_mineru-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for mcp_mineru-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5ea25f5d9d7a39fd8859290cdcc9b61f4d03f32751e25fd2b0347eb767b357b
MD5 12a88384b54a00a67b06f6156cf57639
BLAKE2b-256 c3b3aadeea4a560a32abb1a0743641dcd9673cb05ffea9b7b59ecaf568c03b00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page