Skip to main content

MCP server for converting files to Markdown using MarkItDown

Project description

Flexberry MarkItDown MCP Server

GitHub License: MIT PyPI version

MCP server for converting files to Markdown using MarkItDown library by Microsoft.

Features

  • 🔄 File conversion of various formats to Markdown
  • 📁 Large files - result is saved to disk, not loaded into LLM context
  • 🌍 Cyrillic support in documents and filenames
  • 💻 Cross-platform - Windows and Linux
  • 🔧 Integration with RooCode via Model Context Protocol

Supported Formats

Category Formats
Documents PDF, DOCX, DOC, PPTX, PPT, XLSX, XLS
Web HTML, HTM, XML, URL
Data CSV, JSON
Text MD, RST, TXT
Images (OCR) PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP
Audio (transcription) MP3, WAV, M4A, OGG, FLAC
Archives ZIP
E-books EPUB

⚠️ For OCR images, Tesseract is required. For audio transcription, system support is needed.

Installation

Option 1: Install from PyPI (recommended)

# Install via pip
pip install flexberry-markitdown-mcp

# Install with development dependencies
pip install flexberry-markitdown-mcp[dev]

Option 2: Install from source

# Clone the repository
git clone https://github.com/Flexberry/flexberry-markitdown-mcp.git
cd flexberry-markitdown-mcp

# Create virtual environment (optional but recommended)
python -m venv .venv

# Activate virtual environment
# Linux/macOS:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate

# Install dependencies
pip install -e .

Option 3: Use installation scripts

Linux/macOS:

chmod +x install.sh
./install.sh

Windows:

install.bat

RooCode Configuration

Windows Configuration

Add to RooCode settings (mcp_settings.json or via interface):

{
  "mcpServers": {
    "flexberry-markitdown": {
      "command": "python",
      "args": ["-m", "flexberry_markitdown_mcp.server"]
    }
  }
}

Or with virtual environment:

{
  "mcpServers": {
    "flexberry-markitdown": {
      "command": "C:\\path\\to\\flexberry-markitdown-mcp\\.venv\\Scripts\\python.exe",
      "args": ["-m", "flexberry_markitdown_mcp.server"],
      "cwd": "C:\\path\\to\\flexberry-markitdown-mcp"
    }
  }
}

Linux Configuration

{
  "mcpServers": {
    "flexberry-markitdown": {
      "command": "python3",
      "args": ["-m", "flexberry_markitdown_mcp.server"]
    }
  }
}

Or with virtual environment:

{
  "mcpServers": {
    "flexberry-markitdown": {
      "command": "/home/user/flexberry-markitdown-mcp/.venv/bin/python",
      "args": ["-m", "flexberry_markitdown_mcp.server"],
      "cwd": "/home/user/flexberry-markitdown-mcp"
    }
  }
}

Universal Configuration (via uv)

If using uv:

{
  "mcpServers": {
    "flexberry-markitdown": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/flexberry-markitdown-mcp",
        "run",
        "flexberry-markitdown-mcp"
      ]
    }
  }
}

Available Tools

convert_to_markdown

Converts a file to Markdown and saves the result next to the original file.

Parameters:

  • file_path (required) - path to the file for conversion
  • output_path (optional) - custom path for saving the result
  • overwrite (optional, default false) - overwrite existing file

Example usage in RooCode:

Convert file /home/user/documents/report.pdf to Markdown

get_supported_formats

Returns a list of supported file formats.

check_file_exists

Checks if a file exists and returns information about it.

Usage Examples

Converting PDF with Cyrillic

Convert file C:\Documents\Report 2024.pdf to Markdown

Result will be saved to C:\Documents\Report 2024.md

Converting with overwrite

Convert file /home/user/report.docx with overwrite existing

Converting to specified location

Convert presentation.pptx and save result to /tmp/output.md

Large File Handling

The server is designed to work with files of any size:

  1. File is converted via MarkItDown
  2. Result is saved to disk next to the original file
  3. Only information about path and size is returned to LLM context

This allows working with files that are 100x larger than LLM context limit.

Logging

Server logs are saved to:

  • Linux: ~/.flexberry-markitdown-mcp/server.log
  • Windows: C:\Users\<user>\.flexberry-markitdown-mcp\server.log

Troubleshooting

Error: "MarkItDown not installed"

pip install flexberry-markitdown-mcp

Error: "MCP module not found"

pip install flexberry-markitdown-mcp

Cyrillic issues in Windows

Ensure UTF-8 encoding in terminal. Server automatically sets UTF-8 for stdin/stdout/stderr.

OCR not working for images

Install Tesseract:

For Russian language, install language pack:

  • Windows: select Russian language during installation
  • Linux: sudo apt install tesseract-ocr-rus

Audio transcription not working

MarkItDown uses Azure Speech Services for transcription. Ensure environment variables are configured.

Development

Running tests

pip install -e ".[dev]"
pytest

Project structure

flexberry-markitdown-mcp/
├── src/
│   └── flexberry_markitdown_mcp/
│       ├── __init__.py
│       └── server.py
├── pyproject.toml
├── README_EN.md
├── install.sh
├── install.bat
├── uninstall.sh
├── uninstall.bat
└── roocode-config-examples.json

License

MIT License


Developed by Flexberry team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexberry_markitdown_mcp-1.0.0.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexberry_markitdown_mcp-1.0.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file flexberry_markitdown_mcp-1.0.0.tar.gz.

File metadata

  • Download URL: flexberry_markitdown_mcp-1.0.0.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for flexberry_markitdown_mcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3869e3e9ddde4fe79c43eec27a1b684db27913a0a047249c22302f516c8bdafb
MD5 fb9353ceaad35ab4d08e225ed00b4537
BLAKE2b-256 02acbc68bfd8774c85f427b07570f1dc1657297a7d30a731d5e57d5db1e531c7

See more details on using hashes here.

File details

Details for the file flexberry_markitdown_mcp-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for flexberry_markitdown_mcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f0117781a7f497cfce379b6ccce307b901f754f010e9c7c5efd8f0b8731bcfd
MD5 998144a38aaa077a62a612c18d722bd0
BLAKE2b-256 2434d38fb3b292970f3acb382bc69c0de0ce485308dfde2781512d7925a232f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page