Skip to main content

PDF to Markdown MCP服务器

Project description

MCP-PDF2MD

smithery badge English | 中文

MCP-PDF2MD Service

An MCP-based high-performance PDF to Markdown conversion service powered by MinerU API, supporting batch processing for local files and URL links with structured output.

Key Features

  • Format Conversion: Convert PDF files to structured Markdown format.
  • Multi-source Support: Process both local PDF files and URL links.
  • Intelligent Processing: Automatically select the best processing method.
  • Batch Processing: Support multi-file batch conversion for efficient handling of large volumes of PDF files.
  • MCP Integration: Seamless integration with LLM clients like Claude Desktop.
  • Structure Preservation: Maintain the original document structure, including headings, paragraphs, lists, etc.
  • Smart Layout: Output text in human-readable order, suitable for single-column, multi-column, and complex layouts.
  • Formula Conversion: Automatically recognize and convert formulas in the document to LaTeX format.
  • Table Extraction: Automatically recognize and convert tables in the document to structured format.
  • Cleanup Optimization: Remove headers, footers, footnotes, page numbers, etc., to ensure semantic coherence.
  • High-Quality Extraction: High-quality extraction of text, images, and layout information from PDF documents.

System Requirements

  • Software: Python 3.10+

Quick Start

  1. Clone the repository and enter the directory:

    git clone https://github.com/FutureUnreal/mcp-pdf2md.git
    cd mcp-pdf2md
    
  2. Create a virtual environment and install dependencies:

    Linux/macOS:

    uv venv
    source .venv/bin/activate
    uv pip install -e .
    

    Windows:

    uv venv
    .venv\Scripts\activate
    uv pip install -e .
    
  3. Configure environment variables:

    Create a .env file in the project root directory and set the following environment variables:

    MINERU_API_BASE=https://mineru.net/api/v4/extract/task
    MINERU_BATCH_API=https://mineru.net/api/v4/extract/task/batch
    MINERU_BATCH_RESULTS_API=https://mineru.net/api/v4/extract-results/batch
    MINERU_API_KEY=your_api_key_here
    
  4. Start the service:

    uv run pdf2md
    

Command Line Arguments

The server supports the following command line arguments:

Claude Desktop Configuration

Add the following configuration in Claude Desktop:

Windows:

{
    "mcpServers": {
        "pdf2md": {
            "command": "uv",
            "args": [
                "--directory",
                "C:\\path\\to\\mcp-pdf2md",
                "run",
                "pdf2md",
                "--output-dir",
                "C:\\path\\to\\output"
            ],
            "env": {
                "MINERU_API_KEY": "your_api_key_here"
            }
        }
    }
}

Linux/macOS:

{
    "mcpServers": {
        "pdf2md": {
            "command": "uv",
            "args": [
                "--directory",
                "/path/to/mcp-pdf2md",
                "run",
                "pdf2md",
                "--output-dir",
                "/path/to/output"
            ],
            "env": {
                "MINERU_API_KEY": "your_api_key_here"
            }
        }
    }
}

Note about API Key Configuration: You can set the API key in two ways:

  1. In the .env file within the project directory (recommended for development)
  2. In the Claude Desktop configuration as shown above (recommended for regular use)

If you set the API key in both places, the one in the Claude Desktop configuration will take precedence.

MCP Tools

The server provides the following MCP tools:

  • convert_pdf_url: Convert PDF URL to Markdown
  • convert_pdf_file: Convert local PDF file to Markdown

Getting MinerU API Key

This project relies on the MinerU API for PDF content extraction. To obtain an API key:

  1. Visit MinerU official website and register for an account
  2. After logging in, apply for API testing qualification at this link
  3. Once your application is approved, you can access the API Management page
  4. Generate your API key following the instructions provided
  5. Copy the generated API key
  6. Use this string as the value for MINERU_API_KEY

Note that access to the MinerU API is currently in testing phase and requires approval from the MinerU team. The approval process may take some time, so plan accordingly.

Demo

Input PDF

Input PDF

Output Markdown

Output Markdown

License

MIT License - see the LICENSE file for details.

Credits

This project is based on the API from MinerU.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_futureunreal_pdf2md-0.1.0.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iflow_mcp_futureunreal_pdf2md-0.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file iflow_mcp_futureunreal_pdf2md-0.1.0.tar.gz.

File metadata

  • Download URL: iflow_mcp_futureunreal_pdf2md-0.1.0.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_futureunreal_pdf2md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0525df06046361b9b460daeb6cf559bb08ff80521aa89427f4bdb4f84fafddd6
MD5 223db23852ce533120f79c4a5db62627
BLAKE2b-256 95749937fee9ee5818e7ff7ee0027562cebac7c1edac2678074a23d091043fb0

See more details on using hashes here.

File details

Details for the file iflow_mcp_futureunreal_pdf2md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_futureunreal_pdf2md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_futureunreal_pdf2md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2bd9f77d85c21f02ced3a43368055d499f275621527ee6cec87f09b6c5b1f14
MD5 c47b6884f4ffae4837d425d0d3258d8e
BLAKE2b-256 068d23e82a985a114f06d3bfc41a607ca6e2605be4286abe11d9f3f44d33d590

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page