MCP server for converting files to Markdown using MarkItDown
Project description
Flexberry MarkItDown MCP Server
MCP server for converting files to Markdown using MarkItDown library by Microsoft.
Features
- 🔄 File conversion of various formats to Markdown
- 📁 Large files - result is saved to disk, not loaded into LLM context
- 🌍 Cyrillic support in documents and filenames
- 💻 Cross-platform - Windows and Linux
- 🔧 Integration with RooCode via Model Context Protocol
Supported Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS |
| Web | HTML, HTM, XML, URL |
| Data | CSV, JSON |
| Text | MD, RST, TXT |
| Images (OCR) | PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP |
| Audio (transcription) | MP3, WAV, M4A, OGG, FLAC |
| Archives | ZIP |
| E-books | EPUB |
⚠️ For OCR images, Tesseract is required. For audio transcription, system support is needed.
Installation
Option 1: Install from PyPI (recommended)
# Install via pip
pip install flexberry-markitdown-mcp
# Install with development dependencies
pip install flexberry-markitdown-mcp[dev]
Option 2: Install from source
# Clone the repository
git clone https://github.com/Flexberry/flexberry-markitdown-mcp.git
cd flexberry-markitdown-mcp
# Create virtual environment (optional but recommended)
python -m venv .venv
# Activate virtual environment
# Linux/macOS:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# Install dependencies
pip install -e .
Option 3: Use installation scripts
Linux/macOS:
chmod +x install.sh
./install.sh
Windows:
install.bat
RooCode Configuration
Windows Configuration
Add to RooCode settings (mcp_settings.json or via interface):
{
"mcpServers": {
"flexberry-markitdown": {
"command": "python",
"args": ["-m", "flexberry_markitdown_mcp.server"]
}
}
}
Or with virtual environment:
{
"mcpServers": {
"flexberry-markitdown": {
"command": "C:\\path\\to\\flexberry-markitdown-mcp\\.venv\\Scripts\\python.exe",
"args": ["-m", "flexberry_markitdown_mcp.server"],
"cwd": "C:\\path\\to\\flexberry-markitdown-mcp"
}
}
}
Linux Configuration
{
"mcpServers": {
"flexberry-markitdown": {
"command": "python3",
"args": ["-m", "flexberry_markitdown_mcp.server"]
}
}
}
Or with virtual environment:
{
"mcpServers": {
"flexberry-markitdown": {
"command": "/home/user/flexberry-markitdown-mcp/.venv/bin/python",
"args": ["-m", "flexberry_markitdown_mcp.server"],
"cwd": "/home/user/flexberry-markitdown-mcp"
}
}
}
Universal Configuration (via uv)
If using uv:
{
"mcpServers": {
"flexberry-markitdown": {
"command": "uv",
"args": [
"--directory",
"/path/to/flexberry-markitdown-mcp",
"run",
"flexberry-markitdown-mcp"
]
}
}
}
Available Tools
convert_to_markdown
Converts a file to Markdown and saves the result next to the original file.
Parameters:
file_path(required) - path to the file for conversionoutput_path(optional) - custom path for saving the resultoverwrite(optional, defaultfalse) - overwrite existing file
Example usage in RooCode:
Convert file /home/user/documents/report.pdf to Markdown
get_supported_formats
Returns a list of supported file formats.
check_file_exists
Checks if a file exists and returns information about it.
Usage Examples
Converting PDF with Cyrillic
Convert file C:\Documents\Report 2024.pdf to Markdown
Result will be saved to C:\Documents\Report 2024.md
Converting with overwrite
Convert file /home/user/report.docx with overwrite existing
Converting to specified location
Convert presentation.pptx and save result to /tmp/output.md
Large File Handling
The server is designed to work with files of any size:
- File is converted via MarkItDown
- Result is saved to disk next to the original file
- Only information about path and size is returned to LLM context
This allows working with files that are 100x larger than LLM context limit.
Logging
Server logs are saved to:
- Linux:
~/.flexberry-markitdown-mcp/server.log - Windows:
C:\Users\<user>\.flexberry-markitdown-mcp\server.log
Troubleshooting
Error: "MarkItDown not installed"
pip install flexberry-markitdown-mcp
Error: "MCP module not found"
pip install flexberry-markitdown-mcp
Cyrillic issues in Windows
Ensure UTF-8 encoding in terminal. Server automatically sets UTF-8 for stdin/stdout/stderr.
OCR not working for images
Install Tesseract:
- Windows: download from https://github.com/UB-Mannheim/tesseract/wiki
- Linux:
sudo apt install tesseract-ocr(Ubuntu/Debian)
For Russian language, install language pack:
- Windows: select Russian language during installation
- Linux:
sudo apt install tesseract-ocr-rus
Audio transcription not working
MarkItDown uses Azure Speech Services for transcription. Ensure environment variables are configured.
Development
Running tests
pip install -e ".[dev]"
pytest
Project structure
flexberry-markitdown-mcp/
├── src/
│ └── flexberry_markitdown_mcp/
│ ├── __init__.py
│ └── server.py
├── pyproject.toml
├── README_EN.md
├── install.sh
├── install.bat
├── uninstall.sh
├── uninstall.bat
└── roocode-config-examples.json
License
MIT License
Developed by Flexberry team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flexberry_markitdown_mcp-1.0.0.tar.gz.
File metadata
- Download URL: flexberry_markitdown_mcp-1.0.0.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3869e3e9ddde4fe79c43eec27a1b684db27913a0a047249c22302f516c8bdafb
|
|
| MD5 |
fb9353ceaad35ab4d08e225ed00b4537
|
|
| BLAKE2b-256 |
02acbc68bfd8774c85f427b07570f1dc1657297a7d30a731d5e57d5db1e531c7
|
File details
Details for the file flexberry_markitdown_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: flexberry_markitdown_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f0117781a7f497cfce379b6ccce307b901f754f010e9c7c5efd8f0b8731bcfd
|
|
| MD5 |
998144a38aaa077a62a612c18d722bd0
|
|
| BLAKE2b-256 |
2434d38fb3b292970f3acb382bc69c0de0ce485308dfde2781512d7925a232f5
|