Skip to main content

Convert PDF, DOCX, XLSX to Markdown via a simple CLI or Python API.

Project description

markpipe

Convert PDF, DOCX, and XLSX files to clean Markdown — one command, no API keys.

CI Python License: MIT

Built for developers and teams who feed documents into LLMs or RAG pipelines.


Installation

pip install markpipe

Or from source:

git clone https://github.com/keremnuman/markdown-pipeline.git
cd markdown-pipeline
pip install -e .

CLI

# Single file
doc2md report.pdf

# Entire folder
doc2md ./documents --output ./output_md

# Parallel workers
doc2md ./documents --workers 8

# With config file
doc2md --config config.yaml

Python API

from pathlib import Path
from doc2md import MicrosoftMarkItDownConverter, DocumentPipeline

converter = MicrosoftMarkItDownConverter()
pipeline  = DocumentPipeline(converter=converter, output_dir=Path("./output"))

pipeline.process_single(Path("report.pdf"))   # single file
pipeline.process_batch(Path("./documents"))   # batch

Contributing

git clone https://github.com/keremnuman/markdown-pipeline.git
pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markpipe-0.1.2.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markpipe-0.1.2-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file markpipe-0.1.2.tar.gz.

File metadata

  • Download URL: markpipe-0.1.2.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markpipe-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fa6f78884e9f2d6b8f0f9faf7b3f1e26343a460c32dfa144e57b8ef782a6fde2
MD5 b2dff6caf4abc167c4de5b5747885948
BLAKE2b-256 0412f1c99b03c2c779975d72ca687ef3d97dd799e4ae5af07f945524a4225702

See more details on using hashes here.

File details

Details for the file markpipe-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: markpipe-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markpipe-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a81c977c8abfef6351461993a6ceac1d0e9fbec6266bf4797b832de49b7721aa
MD5 76d55857d32acd40c282142abc342196
BLAKE2b-256 eff153690ca89233f86131a3e72b2f8d74d041c13b2b9b964accdc80852ed3f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page