Document converter for converting PDF, DOCX, Excel to Markdown, with MCP support for Claude Code
Project description
doc2md-helper
Document Conversion MCP Server
Convert PDF, Word, and Excel documents to Markdown with full MCP integration and CLI support. Seamlessly works with Claude Code, Cursor, CodeX, and more.
Quick Start
Installation
pip install doc2md-helper
doc2md-helper install
Then restart your AI coding tool.
Usage with Claude Code
Just ask naturally:
Read this report.pdf file
Claude Code will ask which PDF conversion method you prefer:
- MarkItDown - Fast, lightweight, perfect for text-based PDFs
- Marker - High-precision OCR, great for scanned or complex layouts (requires GPU)
CLI Usage
# Convert PDF (lightweight)
doc2md-helper convert-pdf document.pdf
# Convert PDF (high-precision OCR)
doc2md-helper convert-pdf-marker scanned.pdf
# Convert Word
doc2md-helper convert-docx report.docx
# Convert Excel
doc2md-helper convert-excel data.xlsx
# Save to specific path
doc2md-helper convert-pdf document.pdf -o output.md
Features
- Multiple PDF Conversion Options: Choose between lightweight MarkItDown or high-precision Marker
- Full MCP Integration: Works with Claude Code, Cursor, CodeX, Windsurf, Zed, and more
- CLI Interface: Convert documents directly from the command line
- Multi-Format Support: PDF, DOCX, DOC, XLSX, XLS
- Automatic Platform Detection: Install configures all supported platforms automatically
Installation
Basic Installation
pip install doc2md-helper
This installs core dependencies: mcp, markitdown[pdf], openpyxl, python-docx, mammoth.
With Marker Support (High-Precision OCR)
pip install doc2md-helper[marker]
Requires additional dependencies: marker-pdf, torch, bitsandbytes, PyPDF2.
Full Installation
pip install doc2md-helper[all]
Includes marker support plus additional optional dependencies.
Install from Source
git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e .
Platform Setup
Claude Code
doc2md-helper install --platform claude
Configures ~/.claude/settings.json. Restart Claude Code after installation.
Cursor
doc2md-helper install --platform cursor
Configures .cursor/mcp.json in your project directory.
Other Platforms
doc2md-helper install --platform <platform-name>
Supported platforms: claude, cursor, codex, windsurf, zed, continue, opencode, gemini-cli, qwen, kiro, qoder, copilot, copilot-cli, or all.
Manual Configuration
If auto-configuration fails, add manually to your platform's MCP config:
{
"mcpServers": {
"doc2md-helper": {
"command": "uvx",
"args": ["doc2md-helper", "serve"]
}
}
}
Usage
MCP Tools
| Tool | Description |
|---|---|
convert_pdf_with_markitdown |
Convert PDF using MarkItDown (fast, lightweight) |
convert_pdf_with_marker |
Convert PDF using Marker (high-precision OCR) |
convert_docx_to_markdown |
Convert Word documents |
convert_excel_to_markdown |
Convert Excel spreadsheets |
CLI Commands
# Document conversion
doc2md-helper convert-pdf <file>
doc2md-helper convert-pdf-marker <file>
doc2md-helper convert-docx <file>
doc2md-helper convert-excel <file>
# Installation and setup
doc2md-helper install
doc2md-helper install --platform <name>
# MCP server
doc2md-helper serve
doc2md-helper serve --http --host 127.0.0.1 --port 5555
PDF Conversion Options
| Method | Description | Best For | Dependencies |
|---|---|---|---|
| MarkItDown | Fast, lightweight | Text-based PDFs | markitdown |
| Marker | High-precision OCR | Scanned PDFs, complex layouts | marker-pdf, torch |
Project Structure
mcp-document-converter/
├── mcp_document_converter/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── server.py # MCP server implementation
│ ├── skills.py # Platform configuration and skills
│ ├── pdf2markdown.py # Marker PDF conversion
│ ├── pdf2markdown_markitdown.py # MarkItDown PDF conversion
│ ├── docx2markdown.py # Word document conversion
│ └── excel2markdown.py # Excel spreadsheet conversion
├── .claude/
│ └── doc-converter.md # Claude Code instructions
├── demo/ # Example documents
├── pyproject.toml # Project configuration
└── README.md # This file
Development
Setup Development Environment
git clone https://github.com/your-username/mcp-document-converter
cd mcp-document-converter
pip install -e ".[dev]"
Running Tests
# Add tests here
Troubleshooting
Installation Issues
If you encounter problems with marker-pdf:
pip install doc2md-helper # Just install the basic version
The basic version works great for text-based PDFs.
Platform Configuration Not Working
Try specifying the platform explicitly:
doc2md-helper install --platform claude
Or configure manually as shown in the "Manual Configuration" section.
Contributing
Contributions welcome! Please feel free to:
- Report bugs
- Suggest features
- Submit pull requests
License
MIT License - see LICENSE file for details.
Related Projects
- code-review-graph - Code understanding with knowledge graphs (the inspiration for this project's architecture)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file doc2md_helper-0.1.8.tar.gz.
File metadata
- Download URL: doc2md_helper-0.1.8.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a65c31b8c80047df37389ae4d2d0a9fc8e22675b78a2d8b1406ffd560b93e6e
|
|
| MD5 |
d3693e6e5d27900704883ae66fe400e2
|
|
| BLAKE2b-256 |
0ff9e46b761b704ab247abd0154ea85f21d22c841cdf29bb6487f9ae21217189
|
File details
Details for the file doc2md_helper-0.1.8-py3-none-any.whl.
File metadata
- Download URL: doc2md_helper-0.1.8-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15e4df7f2e2e53c7018a08045a47918ebe6061e67af2ecb6905772ade6f41da7
|
|
| MD5 |
79e14b5d436ccbcfaad7123faa21042f
|
|
| BLAKE2b-256 |
3b1b3db5c45faffa8cdc21e12f4955cf46cc915ee04b2ad36c133015ea516693
|