Skip to main content

Document converter for converting PDF, DOCX, Excel to Markdown, with MCP support for Claude Code

Project description

doc2md-helper

Document Conversion MCP Server

PyPI MIT Licence Python 3.10+ MCP

Convert PDF, Word, and Excel documents to Markdown with full MCP integration and CLI support. Seamlessly works with Claude Code, Cursor, CodeX, and more.


Quick Start

Installation

pip install doc2md-helper
doc2md-helper install

Then restart your AI coding tool.

Usage with Claude Code

Just ask naturally:

Read this report.pdf file

Claude Code will ask which PDF conversion method you prefer:

  1. MarkItDown - Fast, lightweight, perfect for text-based PDFs
  2. Marker - High-precision OCR, great for scanned or complex layouts (requires GPU)

CLI Usage

# Convert PDF (lightweight)
doc2md-helper convert-pdf document.pdf

# Convert PDF (high-precision OCR)
doc2md-helper convert-pdf-marker scanned.pdf

# Convert Word
doc2md-helper convert-docx report.docx

# Convert Excel
doc2md-helper convert-excel data.xlsx

# Save to specific path
doc2md-helper convert-pdf document.pdf -o output.md

Features

  • Multiple PDF Conversion Options: Choose between lightweight MarkItDown or high-precision Marker
  • Full MCP Integration: Works with Claude Code, Cursor, CodeX, Windsurf, Zed, and more
  • CLI Interface: Convert documents directly from the command line
  • Multi-Format Support: PDF, DOCX, DOC, XLSX, XLS
  • Automatic Platform Detection: Install configures all supported platforms automatically

Installation

Basic Installation

pip install doc2md-helper

This installs core dependencies: mcp, markitdown[pdf], openpyxl, python-docx, mammoth.

With Marker Support (High-Precision OCR)

pip install doc2md-helper[marker]

Requires additional dependencies: marker-pdf, torch, bitsandbytes, PyPDF2.

Full Installation

pip install doc2md-helper[all]

Includes marker support plus additional optional dependencies.

Install from Source

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e .

Platform Setup

Claude Code

doc2md-helper install --platform claude

Configures ~/.claude/settings.json. Restart Claude Code after installation.

Cursor

doc2md-helper install --platform cursor

Configures .cursor/mcp.json in your project directory.

Other Platforms

doc2md-helper install --platform <platform-name>

Supported platforms: claude, cursor, codex, windsurf, zed, continue, opencode, gemini-cli, qwen, kiro, qoder, copilot, copilot-cli, or all.

Manual Configuration

If auto-configuration fails, add manually to your platform's MCP config:

{
  "mcpServers": {
    "doc2md-helper": {
      "command": "uvx",
      "args": ["doc2md-helper", "serve"]
    }
  }
}

Usage

MCP Tools

Tool Description
convert_pdf_with_markitdown Convert PDF using MarkItDown (fast, lightweight)
convert_pdf_with_marker Convert PDF using Marker (high-precision OCR)
convert_docx_to_markdown Convert Word documents
convert_excel_to_markdown Convert Excel spreadsheets

CLI Commands

# Document conversion
doc2md-helper convert-pdf <file>
doc2md-helper convert-pdf-marker <file>
doc2md-helper convert-docx <file>
doc2md-helper convert-excel <file>

# Installation and setup
doc2md-helper install
doc2md-helper install --platform <name>

# MCP server
doc2md-helper serve
doc2md-helper serve --http --host 127.0.0.1 --port 5555

PDF Conversion Options

Method Description Best For Dependencies
MarkItDown Fast, lightweight Text-based PDFs markitdown
Marker High-precision OCR Scanned PDFs, complex layouts marker-pdf, torch

Project Structure

mcp-document-converter/
├── mcp_document_converter/
│   ├── __init__.py
│   ├── cli.py                 # Command-line interface
│   ├── server.py              # MCP server implementation
│   ├── skills.py              # Platform configuration and skills
│   ├── pdf2markdown.py        # Marker PDF conversion
│   ├── pdf2markdown_markitdown.py  # MarkItDown PDF conversion
│   ├── docx2markdown.py       # Word document conversion
│   └── excel2markdown.py      # Excel spreadsheet conversion
├── .claude/
│   └── doc-converter.md       # Claude Code instructions
├── demo/                      # Example documents
├── pyproject.toml             # Project configuration
└── README.md                  # This file

Development

Setup Development Environment

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e ".[dev]"

Running Tests

# Add tests here

Troubleshooting

Installation Issues

If you encounter problems with marker-pdf:

pip install doc2md-helper  # Just install the basic version

The basic version works great for text-based PDFs.

Platform Configuration Not Working

Try specifying the platform explicitly:

doc2md-helper install --platform claude

Or configure manually as shown in the "Manual Configuration" section.


Contributing

Contributions welcome! Please feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests

License

MIT License - see LICENSE file for details.


Related Projects

  • code-review-graph - Code understanding with knowledge graphs (the inspiration for this project's architecture)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2md_helper-0.1.9.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2md_helper-0.1.9-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file doc2md_helper-0.1.9.tar.gz.

File metadata

  • Download URL: doc2md_helper-0.1.9.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.9.tar.gz
Algorithm Hash digest
SHA256 3839532927ebb23c0ce77c605d52b90f1ed673f6d74c0a88ae2cd26b793f9a61
MD5 69e1f741c99c906dec85dc19e93f203e
BLAKE2b-256 d8614871c2145443bbc9c7d5a222cdee19bbf106971629dee1cdc825247cf403

See more details on using hashes here.

File details

Details for the file doc2md_helper-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: doc2md_helper-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 31947091e89586eedb481461884c437d43dac93ee5e51df9e8bb075edc3296b1
MD5 e5262f9e9eacc55807a9e80e1e980bf1
BLAKE2b-256 c2b245018b633f34b31ec252859c09e513c53eb2382cba25e42f964d795a974e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page