Skip to main content

Convert any directory of docs (DOCX, CSV, XLSX) to clean Markdown.

Project description

mdpack

Convert any directory of docs to clean Markdown, ready for RAG / LLM ingestion.

One CLI. Point it at a folder of DOCX / XLSX / CSV files, get back a mirrored tree of Markdown — frontmatter-tagged with source path and converter used, inline base64 images stripped, no surprises.

Install

pip install mdpack

For DOCX support, install pandoc as well:

brew install pandoc          # macOS
apt install pandoc           # Ubuntu / Debian

Check your setup:

mdpack doctor

Usage

Convert a whole directory

mdpack convert ~/Desktop/reports
# Writes Markdown into ~/Desktop/reports/converted/

The input directory tree is mirrored: reports/q1/sales.xlsx becomes reports/converted/q1/sales.md.

Convert a single file

mdpack convert proposal.docx -o out/

Options

  • -o, --output PATH — output directory (default: <src>/converted for dirs, <src>_md/ for a single file).
  • --force — re-convert even if the output is newer than the source.
  • --quiet — only print errors.

Incremental by default — mdpack skips files whose output is newer than the source, so you can safely re-run it over a large folder.

Inspect supported formats

mdpack formats
Supported formats:
  csv    .csv
  xlsx   .xlsx
  docx   .docx

What the output looks like

Every converted file gets a YAML frontmatter block so downstream tools know where it came from:

---
title: Q1 Sales Review
source: q1/sales.xlsx
converter: xlsx
converter_version: mdpack 0.1.0
converted_at: 2026-04-16T05:30:00Z
---

# sales

## Summary
| Region | Revenue | YoY |
|---|---|---|
| APAC | 4.2M | +12% |
...

Pair with mdrag

mdrag is a companion project — a local, offline Markdown semantic-search MCP server for Claude Code / Cursor / Cline.

Typical workflow:

# 1. Convert mixed-format docs to Markdown
mdpack convert ~/Desktop/reports

# 2. Point mdrag at the output
mdrag vault add reports ~/Desktop/reports/converted

# 3. Ask Claude Code natural-language questions against the vault

Roadmap

Next up (0.2.0): PDF (Docling, non-OCR mode), PPTX (pandoc), HTML / EPUB (pandoc). Watch mode to auto-convert on source file change is also planned.

Scanned / image-only PDFs (OCR) are intentionally out of scope — use Docling or tesseract upstream if you need them.

Development

git clone https://github.com/andyleimc-source/mdpack
cd mdpack
python -m venv .venv
.venv/bin/pip install -e ".[dev]"
.venv/bin/pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdpack-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdpack-0.1.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file mdpack-0.1.0.tar.gz.

File metadata

  • Download URL: mdpack-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdpack-0.1.0.tar.gz
Algorithm Hash digest
SHA256 26458c6489aa97ca49e571fd56fc542af301c1e78f65294d2ff9e84c65cbd651
MD5 fa70ff505d6ac099f617e79f654d231c
BLAKE2b-256 528752fd16315495218f2dac9bc5543455bc32492596539f37d65f46c7b6554c

See more details on using hashes here.

File details

Details for the file mdpack-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mdpack-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdpack-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a76aea1c6ca36859f1a60ef7f1d4768462b349e638c8efae0362e7aee441d482
MD5 d40a88e72508e41b82cd89335a886e97
BLAKE2b-256 b7ad0ffe81ce516b932a60c4ef099e76ef2bd29993050e61271657066cc1ee34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page