CLI utility to convert PDFs and supported document formats to Markdown/JSON/HTML with Marker API.

These details have not been verified by PyPI

Project links

Project description

PDF to markdown CLI

Command-line utility for converting PDFs and other supported documents into Markdown, JSON, or HTML using the Marker API.

Why use this tool

Converts single files or entire directories
Automatically splits large PDFs into chunks and merges results
Persists request state locally so interrupted runs can recover
Rewrites and copies extracted images into deterministic output folders
Supports OCR/LLM tuning flags from the Marker API

Supported formats

Input

PDF (.pdf)
Word (.doc, .docx, .odt)
PowerPoint (.ppt, .pptx, .odp)
Spreadsheets (.xls, .xlsx, .ods)
EPUB/HTML (.epub, .html)
Images (.png, .jpg, .jpeg, .webp, .gif, .tiff)

Output

Markdown (.md, default)
JSON (.json)
HTML (.html)

Installation

pip install pdf-to-markdown-cli

From source:

git clone https://github.com/SokolskyNikita/pdf-to-markdown-cli.git
cd pdf-to-markdown-cli
pip install -e .

Quick start

export MARKER_PDF_KEY="your_api_key"
pdf-to-md ./examples/equations.pdf

Process a directory:

pdf-to-md ./docs

Use JSON or HTML output:

pdf-to-md ./examples/equations.pdf --json
pdf-to-md ./examples/equations.pdf --html

CLI options

input: input file or directory path
--json: output JSON instead of Markdown
--html: output HTML instead of Markdown
--langs: comma-separated OCR languages (default: English)
--llm: enable LLM-enhanced processing
--strip: redo OCR
--noimg: disable image extraction
--force: force OCR on all pages
--pages: include page delimiters
--max: enable all OCR enhancement flags (--llm --strip --force)
-mp, --max-pages: process only the first N pages
--no-chunk: disable PDF chunking
-cs, --chunk-size: PDF pages per chunk (default: 25)
-o, --output-dir: absolute output directory path
-v, --verbose: debug logging
--version: show installed package version

Development

Run tests:

python -m unittest discover -s tests -v

For contributions or questions, open a GitHub issue.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.2

May 13, 2026

0.5.1

Sep 29, 2025

0.5.0

May 19, 2025

0.4.0

Apr 21, 2025

0.2.1

Apr 10, 2025

0.2.0

Apr 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_to_markdown_cli-0.5.2.tar.gz (32.3 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdf_to_markdown_cli-0.5.2-py3-none-any.whl (32.7 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file pdf_to_markdown_cli-0.5.2.tar.gz.

File metadata

Download URL: pdf_to_markdown_cli-0.5.2.tar.gz
Upload date: May 13, 2026
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pdf_to_markdown_cli-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`f2f4fd12ae5e3dcef5ea69ac747c13a9664bec21ad80733247bb0bf75824e516`
MD5	`a65ea04fa619f9d06cec37098130624c`
BLAKE2b-256	`75035ed068ed8d71b6db74e796743cdf36cdab41fed1cb3889e90d2cc0ef3d51`

See more details on using hashes here.

File details

Details for the file pdf_to_markdown_cli-0.5.2-py3-none-any.whl.

File metadata

Download URL: pdf_to_markdown_cli-0.5.2-py3-none-any.whl
Upload date: May 13, 2026
Size: 32.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pdf_to_markdown_cli-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aaeb6512503be3b81205ed8a7c95ee9d17da38863c8987494f2c3e8e5a18f773`
MD5	`18fd2f19ba57763be7c579ffb4a01336`
BLAKE2b-256	`a3142405d50620d7e300fdc37c3dbd391d2b4820c17b451b4eccc7e32359b6b1`

See more details on using hashes here.

pdf-to-markdown-cli 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PDF to markdown CLI

Why use this tool

Supported formats

Input

Output

Installation

Quick start

CLI options

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes