Skip to main content

Document converter for converting PDF, DOCX, Excel to Markdown, with MCP support for Claude Code

Project description

doc2md-helper

Document Conversion MCP Server

PyPI MIT Licence Python 3.10+ MCP

Convert PDF, Word, and Excel documents to Markdown with full MCP integration and CLI support. Seamlessly works with Claude Code, Cursor, CodeX, and more.


Quick Start

Installation

pip install doc2md-helper
doc2md-helper install

Then restart your AI coding tool.

Usage with Claude Code

Just ask naturally:

Read this report.pdf file

Claude Code will ask which PDF conversion method you prefer:

  1. MarkItDown - Fast, lightweight, perfect for text-based PDFs
  2. Marker - High-precision OCR, great for scanned or complex layouts (requires GPU)

CLI Usage

# Convert PDF (lightweight)
doc2md-helper convert-pdf document.pdf

# Convert PDF (high-precision OCR)
doc2md-helper convert-pdf-marker scanned.pdf

# Convert Word
doc2md-helper convert-docx report.docx

# Convert Excel
doc2md-helper convert-excel data.xlsx

# Save to specific path
doc2md-helper convert-pdf document.pdf -o output.md

Features

  • Multiple PDF Conversion Options: Choose between lightweight MarkItDown or high-precision Marker
  • Full MCP Integration: Works with Claude Code, Cursor, CodeX, Windsurf, Zed, and more
  • CLI Interface: Convert documents directly from the command line
  • Multi-Format Support: PDF, DOCX, DOC, XLSX, XLS
  • Automatic Platform Detection: Install configures all supported platforms automatically

Installation

Basic Installation

pip install doc2md-helper

This installs core dependencies: mcp, markitdown[pdf], openpyxl, python-docx, mammoth.

With Marker Support (High-Precision OCR)

pip install doc2md-helper[marker]

Requires additional dependencies: marker-pdf, torch, bitsandbytes, PyPDF2.

Full Installation

pip install doc2md-helper[all]

Includes marker support plus additional optional dependencies.

Install from Source

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e .

Platform Setup

Claude Code

doc2md-helper install --platform claude

Configures ~/.claude/settings.json. Restart Claude Code after installation.

Cursor

doc2md-helper install --platform cursor

Configures .cursor/mcp.json in your project directory.

Other Platforms

doc2md-helper install --platform <platform-name>

Supported platforms: claude, cursor, codex, windsurf, zed, continue, opencode, gemini-cli, qwen, kiro, qoder, copilot, copilot-cli, or all.

Manual Configuration

If auto-configuration fails, add manually to your platform's MCP config:

{
  "mcpServers": {
    "doc2md-helper": {
      "command": "uvx",
      "args": ["doc2md-helper", "serve"]
    }
  }
}

Usage

MCP Tools

Tool Description
convert_pdf_with_markitdown Convert PDF using MarkItDown (fast, lightweight)
convert_pdf_with_marker Convert PDF using Marker (high-precision OCR)
convert_docx_to_markdown Convert Word documents
convert_excel_to_markdown Convert Excel spreadsheets

CLI Commands

# Document conversion
doc2md-helper convert-pdf <file>
doc2md-helper convert-pdf-marker <file>
doc2md-helper convert-docx <file>
doc2md-helper convert-excel <file>

# Installation and setup
doc2md-helper install
doc2md-helper install --platform <name>

# MCP server
doc2md-helper serve
doc2md-helper serve --http --host 127.0.0.1 --port 5555

PDF Conversion Options

Method Description Best For Dependencies
MarkItDown Fast, lightweight Text-based PDFs markitdown
Marker High-precision OCR Scanned PDFs, complex layouts marker-pdf, torch

Project Structure

mcp-document-converter/
├── mcp_document_converter/
│   ├── __init__.py
│   ├── cli.py                 # Command-line interface
│   ├── server.py              # MCP server implementation
│   ├── skills.py              # Platform configuration and skills
│   ├── pdf2markdown.py        # Marker PDF conversion
│   ├── pdf2markdown_markitdown.py  # MarkItDown PDF conversion
│   ├── docx2markdown.py       # Word document conversion
│   └── excel2markdown.py      # Excel spreadsheet conversion
├── .claude/
│   └── doc-converter.md       # Claude Code instructions
├── demo/                      # Example documents
├── pyproject.toml             # Project configuration
└── README.md                  # This file

Development

Setup Development Environment

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e ".[dev]"

Running Tests

# Add tests here

Troubleshooting

Installation Issues

If you encounter problems with marker-pdf:

pip install doc2md-helper  # Just install the basic version

The basic version works great for text-based PDFs.

Platform Configuration Not Working

Try specifying the platform explicitly:

doc2md-helper install --platform claude

Or configure manually as shown in the "Manual Configuration" section.


Contributing

Contributions welcome! Please feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests

License

MIT License - see LICENSE file for details.


Related Projects

  • code-review-graph - Code understanding with knowledge graphs (the inspiration for this project's architecture)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2md_helper-0.1.5.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2md_helper-0.1.5-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file doc2md_helper-0.1.5.tar.gz.

File metadata

  • Download URL: doc2md_helper-0.1.5.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.5.tar.gz
Algorithm Hash digest
SHA256 b9554f902d37ecac8ca8fe128513c2e5a3afd0ff9fa76a3ceb44557d67378a07
MD5 6b7715d1c871e46c82257ad40ed657d7
BLAKE2b-256 9b2fb79700632f668d26c752787be93fab067e64db808f7b1a2e391be079c77b

See more details on using hashes here.

File details

Details for the file doc2md_helper-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: doc2md_helper-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc70052c69920b95121ada2685e46ecdf07bc0dbbf510282aedef80dee41f63
MD5 d868480e18c552f66505ba5896e2b6fd
BLAKE2b-256 9b2f4117db2c75af291d2146145a8c70ccb7ccf9447f16d7b5d05eec380388e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page