Convert any directory of docs (DOCX, PPTX, PDF, XLSX, CSV) to clean Markdown, with a watch mode that auto-syncs on change.
Project description
mdpack
Convert any directory of docs to clean Markdown, ready for RAG / LLM ingestion.
One CLI. Point it at a folder of DOCX / PPTX / PDF / XLSX / CSV files, get back a mirrored tree of Markdown — frontmatter-tagged with source path and converter used, inline base64 images stripped, no surprises.
Want it to auto-sync on every save instead of running by hand? Use mdpack watch.
Install
pip install mdpack
For DOCX and PPTX, install pandoc:
brew install pandoc # macOS
apt install pandoc # Ubuntu / Debian
PDF is optional (Docling pulls ~1GB of torch/transformers) — only install if you need it:
pip install 'mdpack[pdf]'
Check your setup:
mdpack doctor
Usage
Convert a whole directory
mdpack convert ~/Desktop/reports
# Writes Markdown into ~/Desktop/reports/converted/
Input tree is mirrored: reports/q1/sales.xlsx → reports/converted/q1/sales.md.
Convert a single file
mdpack convert proposal.docx -o out/
Options
-o, --output PATH— output directory (default:<src>/convertedfor dirs).--force— re-convert even if the output is newer than the source.--quiet— only print errors.
Incremental by default — mdpack skips files whose output is newer than the source.
Inspect supported formats
mdpack formats
Supported formats:
csv .csv
xlsx .xlsx
docx .docx
pptx .pptx
pdf .pdf
Watch mode
The killer feature. Instead of running mdpack convert every time you save a file,
point mdpack watch at a directory and it stays running — every create / modify / delete
/ rename is batched with a 1.5s debounce and applied to the output tree.
mdpack watch ~/Desktop/reports
# Watches ~/Desktop/reports, keeps ~/Desktop/reports/converted/ in sync. Ctrl-C to stop.
Or with a separate output directory:
mdpack watch ~/Desktop/A -o ~/Desktop/B
What it does on each event:
| Source change | What happens in output |
|---|---|
new .docx added |
corresponding .md created |
.xlsx edited |
.md re-generated |
.csv deleted |
.md deleted |
| file renamed | old .md deleted, new one created |
| file inside the output dir touched | ignored (no infinite loops) |
On startup, watch does one incremental sync pass first, so the output is already
aligned when event handling begins. Use --no-initial-sync to skip that, or
--force-initial-sync to rebuild everything.
Keeping it running in the background
mdpack watch runs in the foreground. Pick whichever background option suits your setup:
tmux — simplest, survives terminal close:
tmux new -d -s mdpack 'mdpack watch ~/Desktop/reports'
tmux attach -t mdpack # inspect
nohup — crude but works everywhere:
nohup mdpack watch ~/Desktop/reports > ~/mdpack.log 2>&1 &
launchd (macOS) — start on login, auto-restart on crash. Save as
~/Library/LaunchAgents/com.example.mdpack.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key><string>com.example.mdpack</string>
<key>ProgramArguments</key>
<array>
<string>/usr/local/bin/mdpack</string>
<string>watch</string>
<string>/Users/you/Desktop/reports</string>
</array>
<key>RunAtLoad</key><true/>
<key>KeepAlive</key><true/>
<key>StandardOutPath</key><string>/tmp/mdpack.log</string>
<key>StandardErrorPath</key><string>/tmp/mdpack.log</string>
</dict>
</plist>
Load it: launchctl load ~/Library/LaunchAgents/com.example.mdpack.plist
systemd user unit (Linux) — similar idea, save as ~/.config/systemd/user/mdpack.service:
[Unit]
Description=mdpack watch
[Service]
ExecStart=/usr/bin/mdpack watch %h/Desktop/reports
Restart=on-failure
[Install]
WantedBy=default.target
Enable: systemctl --user enable --now mdpack.service
Pair with mdrag
mdrag is a companion project — a local, offline Markdown semantic-search MCP server for Claude Code / Cursor / Cline.
Fully-automatic pipeline (source changes → Markdown updated → index updated):
# Terminal 1: keep Markdown in sync with the source dir
mdpack watch ~/Desktop/reports -o ~/Desktop/reports-md
# Terminal 2 (or launched by Claude Code): serve the vault + auto-reindex
mdrag vault add reports ~/Desktop/reports-md # one-time registration
mdrag serve # watches the vault, re-indexes on change
Now edit any .docx / .pptx / .xlsx under ~/Desktop/reports/ — within ~3 seconds,
the matching .md is rewritten, mdrag notices and re-embeds it, and your next search from
Claude Code sees the updated content. No manual steps. Both tools are
loosely coupled — they don't know about each other, they just both watch the middle
directory.
What the output looks like
Every converted file gets a YAML frontmatter block so downstream tools know where it came from:
---
title: Q1 Sales Review
source: q1/sales.xlsx
converter: xlsx
converter_version: mdpack 0.2.0
converted_at: 2026-04-16T05:30:00Z
---
# sales
## Summary
| Region | Revenue | YoY |
|---|---|---|
| APAC | 4.2M | +12% |
...
Roadmap
Next up (0.3.0): HTML and EPUB (pandoc), and ready-to-use background scripts
(maybe a mdpack install-service that writes the plist / systemd unit for you).
Scanned / image-only PDFs (OCR) remain intentionally out of scope — if you need them,
run Docling with its OCR pipeline upstream,
or use tesseract.
Development
git clone https://github.com/andyleimc-source/mdpack
cd mdpack
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
.venv/bin/pytest
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdpack-0.2.1.tar.gz.
File metadata
- Download URL: mdpack-0.2.1.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f09a3ab9568b78d993cb70b1636dc0f66bcc8b6f46eef530d0b183657ec5e3b6
|
|
| MD5 |
461b9a388b8fc0a328566b0a8d6c7b63
|
|
| BLAKE2b-256 |
2dc48a0ba10debd4eedd23a06224bf98e988c525f3d79a412d76b078630e2e60
|
File details
Details for the file mdpack-0.2.1-py3-none-any.whl.
File metadata
- Download URL: mdpack-0.2.1-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
238c9926b38e53d165719b4aedf0fa04487e5c403fec6efdfbecbf24ff6c5e08
|
|
| MD5 |
f2f4165f31cd3981996dfdaaf33d09a6
|
|
| BLAKE2b-256 |
1fc016bf33b6902109f939074b9777bca8a600390ed4765eb910b7c5c53f06f9
|