Skip to main content

MCP server for converting PDF files to Markdown using AI sampling

Project description

PDF2MD MCP Server

An MCP (Model Context Protocol) server that converts PDF files to Markdown format using AI sampling capabilities.

Features

  • Convert PDF files to Markdown using AI content extraction
  • Support for both local file paths and URLs
  • Incremental conversion - resume from where you left off
  • Configurable output directory
  • Built with FastMCP for high performance

Installation

pip install pdf2md-mcp

Usage

As an MCP Server

Start the server:

pdf2md-mcp

The server will expose MCP tools for PDF to Markdown conversion.

Available Tools

convert_pdf_to_markdown

Converts a PDF file to Markdown format using AI sampling.

Parameters:

  • file_path (string): Local file path or URL to the PDF file
  • output_dir (string, optional): Output directory for the markdown file. Defaults to the same directory as input file (for local files) or current working directory (for URLs)

Returns:

  • output_file: Path to the generated markdown file
  • summary: Summary of the conversion task
  • pages_processed: Number of pages processed

Requirements

  • Python 3.10+
  • An MCP-compatible client with AI sampling capabilities
  • Network access for URL-based PDF files

Development

Setup

git clone https://github.com/shuminghuang/pdf2md-mcp.git
cd pdf2md-mcp
pip install -e ".[dev]"

Running Tests

pytest

Code Formatting

black .
isort .

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2md_mcp-0.1.1.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2md_mcp-0.1.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pdf2md_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: pdf2md_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.9

File hashes

Hashes for pdf2md_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b3ecd64cb2a8c7d537f57c236416ba14065470d9654b25fe4622fcb71e4bf72a
MD5 9b88f8d033b5ea1a034ce29e5b7ce90a
BLAKE2b-256 a1056183555ad2c23fc71f7557827b42dc7c7c901e3f0aaf328a765a97ee4806

See more details on using hashes here.

File details

Details for the file pdf2md_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pdf2md_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.9

File hashes

Hashes for pdf2md_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c511ac57d7f948ef5e509147197ebcf63b84474b462b9e1836923c9e2b6aab95
MD5 c535770ba2465de1bef68f80f794a11c
BLAKE2b-256 2c8c14fa652ea58beded0e28996a7798abb3661786285d683301f36aa75e0d3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page