Skip to main content

Document converter for converting PDF, DOCX, Excel to Markdown, with MCP support for Claude Code

Project description

doc2md-helper

Document Conversion MCP Server

PyPI MIT Licence Python 3.10+ MCP

Convert PDF, Word, and Excel documents to Markdown with full MCP integration and CLI support. Seamlessly works with Claude Code, Cursor, CodeX, and more.


Quick Start

Installation

pip install doc2md-helper
doc2md-helper install

Then restart your AI coding tool.

Usage with Claude Code

Just ask naturally:

Read this report.pdf file

Claude Code will ask which PDF conversion method you prefer:

  1. MarkItDown - Fast, lightweight, perfect for text-based PDFs
  2. Marker - High-precision OCR, great for scanned or complex layouts (requires GPU)

CLI Usage

# Convert PDF (lightweight)
doc2md-helper convert-pdf document.pdf

# Convert PDF (high-precision OCR)
doc2md-helper convert-pdf-marker scanned.pdf

# Convert Word
doc2md-helper convert-docx report.docx

# Convert Excel
doc2md-helper convert-excel data.xlsx

# Save to specific path
doc2md-helper convert-pdf document.pdf -o output.md

Features

  • Multiple PDF Conversion Options: Choose between lightweight MarkItDown or high-precision Marker
  • Full MCP Integration: Works with Claude Code, Cursor, CodeX, Windsurf, Zed, and more
  • CLI Interface: Convert documents directly from the command line
  • Multi-Format Support: PDF, DOCX, DOC, XLSX, XLS
  • Automatic Platform Detection: Install configures all supported platforms automatically

Installation

Basic Installation

pip install doc2md-helper

This installs core dependencies: mcp, markitdown[pdf], openpyxl, python-docx, mammoth.

With Marker Support (High-Precision OCR)

pip install doc2md-helper[marker]

Requires additional dependencies: marker-pdf, torch, bitsandbytes, PyPDF2.

Full Installation

pip install doc2md-helper[all]

Includes marker support plus additional optional dependencies.

Install from Source

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e .

Platform Setup

Claude Code

doc2md-helper install --platform claude

Configures ~/.claude/settings.json. Restart Claude Code after installation.

Cursor

doc2md-helper install --platform cursor

Configures .cursor/mcp.json in your project directory.

Other Platforms

doc2md-helper install --platform <platform-name>

Supported platforms: claude, cursor, codex, windsurf, zed, continue, opencode, gemini-cli, qwen, kiro, qoder, copilot, copilot-cli, or all.

Manual Configuration

If auto-configuration fails, add manually to your platform's MCP config:

{
  "mcpServers": {
    "doc2md-helper": {
      "command": "uvx",
      "args": ["doc2md-helper", "serve"]
    }
  }
}

Usage

MCP Tools

Tool Description
convert_pdf_with_markitdown Convert PDF using MarkItDown (fast, lightweight)
convert_pdf_with_marker Convert PDF using Marker (high-precision OCR)
convert_docx_to_markdown Convert Word documents
convert_excel_to_markdown Convert Excel spreadsheets

CLI Commands

# Document conversion
doc2md-helper convert-pdf <file>
doc2md-helper convert-pdf-marker <file>
doc2md-helper convert-docx <file>
doc2md-helper convert-excel <file>

# Installation and setup
doc2md-helper install
doc2md-helper install --platform <name>

# MCP server
doc2md-helper serve
doc2md-helper serve --http --host 127.0.0.1 --port 5555

PDF Conversion Options

Method Description Best For Dependencies
MarkItDown Fast, lightweight Text-based PDFs markitdown
Marker High-precision OCR Scanned PDFs, complex layouts marker-pdf, torch

Project Structure

mcp-document-converter/
├── mcp_document_converter/
│   ├── __init__.py
│   ├── cli.py                 # Command-line interface
│   ├── server.py              # MCP server implementation
│   ├── skills.py              # Platform configuration and skills
│   ├── pdf2markdown.py        # Marker PDF conversion
│   ├── pdf2markdown_markitdown.py  # MarkItDown PDF conversion
│   ├── docx2markdown.py       # Word document conversion
│   └── excel2markdown.py      # Excel spreadsheet conversion
├── .claude/
│   └── doc-converter.md       # Claude Code instructions
├── demo/                      # Example documents
├── pyproject.toml             # Project configuration
└── README.md                  # This file

Development

Setup Development Environment

git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e ".[dev]"

Running Tests

# Add tests here

Troubleshooting

Installation Issues

If you encounter problems with marker-pdf:

pip install doc2md-helper  # Just install the basic version

The basic version works great for text-based PDFs.

Platform Configuration Not Working

Try specifying the platform explicitly:

doc2md-helper install --platform claude

Or configure manually as shown in the "Manual Configuration" section.


Contributing

Contributions welcome! Please feel free to:

  • Report bugs
  • Suggest features
  • Submit pull requests

License

MIT License - see LICENSE file for details.


Related Projects

  • code-review-graph - Code understanding with knowledge graphs (the inspiration for this project's architecture)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2md_helper-0.1.3.tar.gz (38.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

doc2md_helper-0.1.3-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file doc2md_helper-0.1.3.tar.gz.

File metadata

  • Download URL: doc2md_helper-0.1.3.tar.gz
  • Upload date:
  • Size: 38.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0282d0c9523151bf9589f320a72ab888ac743ab8efbbbfaeaaeca51334b5faa5
MD5 80a9e18280e9bbb0f9f834072d88e18c
BLAKE2b-256 9180e3cf746fb02892b73ca45f5d4d6e2894ebf531a6050ba24765ec1579c31f

See more details on using hashes here.

File details

Details for the file doc2md_helper-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: doc2md_helper-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for doc2md_helper-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8235706ecf02aa0425adc89fd2ce2809e361264e8f3ccd48c7b0211f578c2104
MD5 40ed0b86ed81e6537685a733552611ea
BLAKE2b-256 b5e88b76c73af251f849e86759a361ad4fd89cfebe546bd41c69a52702c0301f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page