Convert PDF, DOCX, XLSX to Markdown via a simple CLI or Python API.
Project description
doc2md
Convert PDF, DOCX, and XLSX files to clean Markdown — one command, no API keys.
Built for developers and teams who feed documents into LLMs or RAG pipelines.
Installation
pip install doc2md
Or from source:
git clone https://github.com/keremnuman/markdown-pipeline.git
cd markdown-pipeline
pip install -e .
CLI
# Single file
doc2md report.pdf
# Entire folder
doc2md ./documents --output ./output_md
# Parallel workers
doc2md ./documents --workers 8
# With config file
doc2md --config config.yaml
Python API
from pathlib import Path
from doc2md import MicrosoftMarkItDownConverter, DocumentPipeline
converter = MicrosoftMarkItDownConverter()
pipeline = DocumentPipeline(converter=converter, output_dir=Path("./output"))
pipeline.process_single(Path("report.pdf")) # single file
pipeline.process_batch(Path("./documents")) # batch
Contributing
git clone https://github.com/keremnuman/markdown-pipeline.git
pip install -e ".[dev]"
pytest tests/ -v
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
markpipe-0.1.0.tar.gz
(6.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markpipe-0.1.0.tar.gz.
File metadata
- Download URL: markpipe-0.1.0.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
503a56242c23b2432be28e4e8ba54b9a0a6b58b19be8823bfbe9761809cee302
|
|
| MD5 |
07ad5722bb6738a389abd9a5aa0253e3
|
|
| BLAKE2b-256 |
c63a1c80cf67e37c6a5b9253e261a239c765beca590c5a8e2c321581ad2ae61c
|
File details
Details for the file markpipe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: markpipe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbb57fc83ca2be1c5a510e4af430cd8ad70200cc40f7014f446bd604d22c88f8
|
|
| MD5 |
c564028100a8ffe4f884d4da2af8195e
|
|
| BLAKE2b-256 |
e977afaf111977a4c52f64be4987f50de441dbed118bb1f392ee87b3fcf93117
|