Skip to main content

Convert PDF, DOCX, XLSX to Markdown via a simple CLI or Python API.

Project description

doc2md

Convert PDF, DOCX, and XLSX files to clean Markdown — one command, no API keys.

CI Python License: MIT

Built for developers and teams who feed documents into LLMs or RAG pipelines.


Installation

pip install doc2md

Or from source:

git clone https://github.com/keremnuman/markdown-pipeline.git
cd markdown-pipeline
pip install -e .

CLI

# Single file
doc2md report.pdf

# Entire folder
doc2md ./documents --output ./output_md

# Parallel workers
doc2md ./documents --workers 8

# With config file
doc2md --config config.yaml

Python API

from pathlib import Path
from doc2md import MicrosoftMarkItDownConverter, DocumentPipeline

converter = MicrosoftMarkItDownConverter()
pipeline  = DocumentPipeline(converter=converter, output_dir=Path("./output"))

pipeline.process_single(Path("report.pdf"))   # single file
pipeline.process_batch(Path("./documents"))   # batch

Contributing

git clone https://github.com/keremnuman/markdown-pipeline.git
pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markpipe-0.1.0.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markpipe-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file markpipe-0.1.0.tar.gz.

File metadata

  • Download URL: markpipe-0.1.0.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markpipe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 503a56242c23b2432be28e4e8ba54b9a0a6b58b19be8823bfbe9761809cee302
MD5 07ad5722bb6738a389abd9a5aa0253e3
BLAKE2b-256 c63a1c80cf67e37c6a5b9253e261a239c765beca590c5a8e2c321581ad2ae61c

See more details on using hashes here.

File details

Details for the file markpipe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: markpipe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for markpipe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbb57fc83ca2be1c5a510e4af430cd8ad70200cc40f7014f446bd604d22c88f8
MD5 c564028100a8ffe4f884d4da2af8195e
BLAKE2b-256 e977afaf111977a4c52f64be4987f50de441dbed118bb1f392ee87b3fcf93117

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page