Skip to main content

Document converter for converting PDF, DOCX, Excel to Markdown, with MCP support for Claude Code

Project description

doc2md-helper

Document Conversion MCP Server

PyPI MIT Licence Python 3.10+ MCP

Convert PDF, Word, and Excel documents to Markdown with full MCP integration and CLI support. Seamlessly works with Claude Code, Cursor, CodeX, and more.


Quick Start

Installation

pip install doc2md-helper
doc2md-helper install

Then restart your AI coding tool.

Usage with Claude Code

Just ask naturally:

Read this report.pdf file

Claude Code will ask which PDF conversion method you prefer:

  1. MarkItDown - Fast, lightweight, perfect for text-based PDFs
  2. Marker - High-precision OCR, great for scanned or complex layouts (requires GPU)

CLI Usage

# Convert PDF (lightweight)
doc2md-helper convert-pdf document.pdf

# Convert PDF (high-precision OCR)
doc2md-helper convert-pdf-marker scanned.pdf

# Convert Word
doc2md-helper convert-docx report.docx

# Convert Excel
doc2md-helper convert-excel data.xlsx

# Save to specific path
doc2md-helper convert-pdf document.pdf -o output.md

Features

  • Multiple PDF Conversion Options: Choose between lightweight MarkItDown or high-precision Marker
  • Full MCP Integration: Works with Claude Code, Cursor, CodeX, Windsurf, Zed, and more
  • CLI Interface: Convert documents directly from the command line
  • Multi-Format Support: PDF, DOCX, DOC, XLSX, XLS
  • Automatic Platform Detection: Install configures all supported platforms automatically

Installation

Basic Installation

pip install doc2md-helper

This installs core dependencies: mcp, markitdown[pdf], openpyxl, python-docx, mammoth.

With Marker Support (High-Precision OCR)

pip install doc2md-helper[marker]

Requires additional dependencies: marker-pdf, torch, bitsandbytes, PyPDF2.

Full Installation

pip install doc2md-helper[all]

Includes marker support plus additional optional dependencies.

Install from Source

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e .

Platform Setup

Claude Code

doc2md-helper install --platform claude

Configures ~/.claude/settings.json. Restart Claude Code after installation.

Cursor

doc2md-helper install --platform cursor

Configures .cursor/mcp.json in your project directory.

Other Platforms

doc2md-helper install --platform <platform-name>

Supported platforms: claude, cursor, codex, windsurf, zed, continue, opencode, gemini-cli, qwen, kiro, qoder, copilot, copilot-cli, or all.

Manual Configuration

If auto-configuration fails, add manually to your platform's MCP config:

{
  "mcpServers": {
    "doc2md-helper": {
      "command": "uvx",
      "args": ["doc2md-helper", "serve"]
    }
  }
}

Usage

MCP Tools

Tool Description
convert_pdf_with_markitdown Convert PDF using MarkItDown (fast, lightweight)
convert_pdf_with_marker Convert PDF using Marker (high-precision OCR)
convert_docx_to_markdown Convert Word documents
convert_excel_to_markdown Convert Excel spreadsheets

CLI Commands

# Document conversion
doc2md-helper convert-pdf <file>
doc2md-helper convert-pdf-marker <file>
doc2md-helper convert-docx <file>
doc2md-helper convert-excel <file>

# Installation and setup
doc2md-helper install
doc2md-helper install --platform <name>

# MCP server
doc2md-helper serve
doc2md-helper serve --http --host 127.0.0.1 --port 5555

PDF Conversion Options

Method Description Best For Dependencies
MarkItDown Fast, lightweight Text-based PDFs markitdown
Marker High-precision OCR Scanned PDFs, complex layouts marker-pdf, torch

Project Structure

mcp-document-converter/
├── mcp_document_converter/
│   ├── __init__.py
│   ├── cli.py                 # Command-line interface
│   ├── server.py              # MCP server implementation
│   ├── skills.py              # Platform configuration and skills
│   ├── pdf2markdown.py        # Marker PDF conversion
│   ├── pdf2markdown_markitdown.py  # MarkItDown PDF conversion
│   ├── docx2markdown.py       # Word document conversion
│   └── excel2markdown.py      # Excel spreadsheet conversion
├── .claude/
│   └── doc-converter.md       # Claude Code instructions
├── demo/                      # Example documents
├── pyproject.toml             # Project configuration
└── README.md                  # This file

Development

Setup Development Environment

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e ".[dev]"

Running Tests

# Add tests here

Troubleshooting

Installation Issues

If you encounter problems with marker-pdf:

pip install doc2md-helper  # Just install the basic version

The basic version works great for text-based PDFs.

Platform Configuration Not Working

Try specifying the platform explicitly:

doc2md-helper install --platform claude

Or configure manually as shown in the "Manual Configuration" section.


Contributing

Contributions welcome! Please feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests

License

MIT License - see LICENSE file for details.


Related Projects

  • code-review-graph - Code understanding with knowledge graphs (the inspiration for this project's architecture)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2md_helper-0.1.8.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2md_helper-0.1.8-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file doc2md_helper-0.1.8.tar.gz.

File metadata

  • Download URL: doc2md_helper-0.1.8.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.8.tar.gz
Algorithm Hash digest
SHA256 4a65c31b8c80047df37389ae4d2d0a9fc8e22675b78a2d8b1406ffd560b93e6e
MD5 d3693e6e5d27900704883ae66fe400e2
BLAKE2b-256 0ff9e46b761b704ab247abd0154ea85f21d22c841cdf29bb6487f9ae21217189

See more details on using hashes here.

File details

Details for the file doc2md_helper-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: doc2md_helper-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 15e4df7f2e2e53c7018a08045a47918ebe6061e67af2ecb6905772ade6f41da7
MD5 79e14b5d436ccbcfaad7123faa21042f
BLAKE2b-256 3b1b3db5c45faffa8cdc21e12f4955cf46cc915ee04b2ad36c133015ea516693

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page