Skip to main content

Convert any directory of docs (DOCX, CSV, XLSX, PDF, PPTX, HTML) to clean Markdown.

Project description

mdpack

Convert any directory of docs to clean Markdown, ready for RAG / LLM ingestion.

🚧 WIP. 0.0.1 is a name placeholder on PyPI — real functionality lands in 0.1.0.

What this will be

A single CLI that walks a folder full of mixed-format documents (DOCX, CSV, XLSX, PDF, PPTX, HTML) and emits clean Markdown, mirroring the source directory structure. Designed to feed downstream RAG / LLM pipelines — in particular mdrag, a local semantic-search MCP server for Markdown folders.

Why a separate tool? Conversion is a messy, format-specific problem (pandoc for DOCX, Docling for PDF, openpyxl for XLSX, and so on). Keeping it out of mdrag keeps both tools focused: mdpack produces Markdown, mdrag indexes Markdown.

Planned MVP (0.1.0)

Format Backend Status
DOCX pandoc planned
CSV stdlib planned
XLSX openpyxl planned

Later (0.2.0+): PDF (Docling, non-OCR), PPTX, HTML, EPUB.

Install (placeholder)

pip install mdpack
mdpack --version      # 0.0.1 (placeholder — no conversion yet)

Companion project

  • mdrag — give any local Markdown folder a semantic-search MCP server. Run mdpack first to convert mixed-format docs, then point mdrag at the output directory.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdpack-0.0.1.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdpack-0.0.1-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file mdpack-0.0.1.tar.gz.

File metadata

  • Download URL: mdpack-0.0.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdpack-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b9a1a00eb389c98b6b598e764e48d9f70f02cbf4807892f00a73ae63790c338a
MD5 00a072dc877b05f5fd666c967c531d51
BLAKE2b-256 1979d0174e810ad17af5dc2dbcf4917f3b612fe379b923711668555f9c97f192

See more details on using hashes here.

File details

Details for the file mdpack-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mdpack-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mdpack-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9eebc3d2ef85e50f326e55f269c63ddfd8e5fc4d88e7985ff43b5265913f0af9
MD5 814d1bc70d804b6c97183b7b66c6e48f
BLAKE2b-256 8c61febaf2a9ab93f119f42e53c8267a60faba0bb3670846620b8c0250c406d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page