Convert any directory of docs (DOCX, CSV, XLSX) to clean Markdown.
Project description
mdpack
Convert any directory of docs to clean Markdown, ready for RAG / LLM ingestion.
One CLI. Point it at a folder of DOCX / XLSX / CSV files, get back a mirrored tree of Markdown — frontmatter-tagged with source path and converter used, inline base64 images stripped, no surprises.
Install
pip install mdpack
For DOCX support, install pandoc as well:
brew install pandoc # macOS
apt install pandoc # Ubuntu / Debian
Check your setup:
mdpack doctor
Usage
Convert a whole directory
mdpack convert ~/Desktop/reports
# Writes Markdown into ~/Desktop/reports/converted/
The input directory tree is mirrored: reports/q1/sales.xlsx becomes
reports/converted/q1/sales.md.
Convert a single file
mdpack convert proposal.docx -o out/
Options
-o, --output PATH— output directory (default:<src>/convertedfor dirs,<src>_md/for a single file).--force— re-convert even if the output is newer than the source.--quiet— only print errors.
Incremental by default — mdpack skips files whose output is newer than the source, so you can safely re-run it over a large folder.
Inspect supported formats
mdpack formats
Supported formats:
csv .csv
xlsx .xlsx
docx .docx
What the output looks like
Every converted file gets a YAML frontmatter block so downstream tools know where it came from:
---
title: Q1 Sales Review
source: q1/sales.xlsx
converter: xlsx
converter_version: mdpack 0.1.0
converted_at: 2026-04-16T05:30:00Z
---
# sales
## Summary
| Region | Revenue | YoY |
|---|---|---|
| APAC | 4.2M | +12% |
...
Pair with mdrag
mdrag is a companion project — a local, offline Markdown semantic-search MCP server for Claude Code / Cursor / Cline.
Typical workflow:
# 1. Convert mixed-format docs to Markdown
mdpack convert ~/Desktop/reports
# 2. Point mdrag at the output
mdrag vault add reports ~/Desktop/reports/converted
# 3. Ask Claude Code natural-language questions against the vault
Roadmap
Next up (0.2.0): PDF (Docling, non-OCR mode), PPTX (pandoc), HTML / EPUB (pandoc). Watch mode to auto-convert on source file change is also planned.
Scanned / image-only PDFs (OCR) are intentionally out of scope — use
Docling or tesseract upstream if you need them.
Development
git clone https://github.com/andyleimc-source/mdpack
cd mdpack
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
.venv/bin/pytest
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdpack-0.1.0.tar.gz.
File metadata
- Download URL: mdpack-0.1.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26458c6489aa97ca49e571fd56fc542af301c1e78f65294d2ff9e84c65cbd651
|
|
| MD5 |
fa70ff505d6ac099f617e79f654d231c
|
|
| BLAKE2b-256 |
528752fd16315495218f2dac9bc5543455bc32492596539f37d65f46c7b6554c
|
File details
Details for the file mdpack-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mdpack-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a76aea1c6ca36859f1a60ef7f1d4768462b349e638c8efae0362e7aee441d482
|
|
| MD5 |
d40a88e72508e41b82cd89335a886e97
|
|
| BLAKE2b-256 |
b7ad0ffe81ce516b932a60c4ef099e76ef2bd29993050e61271657066cc1ee34
|