Skip to main content

CLI tool to convert PDF files (and other documents) to markdown using the Marker API.

Project description

PDF to Markdown CLI

Convert PDFs and other documents to Markdown using the Marker API.

Features

  • Convert PDFs, Word docs, PowerPoint, spreadsheets, EPUB, HTML, and images to Markdown/JSON/HTML
  • Automatic chunking for large documents with parallel processing
  • Progress tracking and local caching for interrupted runs
  • Full OCR customization options

Installation

From PyPI

pip install pdf-to-markdown-cli

From source

git clone https://github.com/SokolskyNikita/pdf-to-markdown-cli.git 
cd pdf-to-markdown-cli
pip install -e .

Usage

# Get API key from https://www.datalab.to/marker
export MARKER_PDF_KEY=your_api_key_here

# Basic usage
pdf-to-md /path/to/file.pdf

# Common options
pdf-to-md /path/to/file.pdf --json          # JSON output
pdf-to-md /path/to/file.pdf --noimg         # Disable images  
pdf-to-md /path/to/file.pdf --max           # Enable all flags for maximum output quality

CLI Options

  • input: Input file or directory path
  • --json: Output in JSON format (default is markdown)
  • --langs: Comma-separated OCR languages (default: "English")
  • --llm: Use LLM for enhanced processing
  • --strip: Redo OCR processing
  • --noimg: Disable image extraction
  • --force: Force OCR on all pages
  • --pages: Add page delimiters
  • --max: Enable all OCR enhancements (equivalent to --llm --strip --force)
  • -mp, --max-pages: Maximum number of pages to process from the start of the file
  • --no-chunk: Disable PDF chunking
  • -cs, --chunk-size: Set PDF chunk size in pages (default: 25)
  • -o, --output-dir: Absolute path to the output directory
  • -v, --verbose: Enable verbose (DEBUG level) logging
  • --version: Show the installed version and exit

Requirements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_to_markdown_cli-0.5.1.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_to_markdown_cli-0.5.1-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file pdf_to_markdown_cli-0.5.1.tar.gz.

File metadata

  • Download URL: pdf_to_markdown_cli-0.5.1.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for pdf_to_markdown_cli-0.5.1.tar.gz
Algorithm Hash digest
SHA256 213075ea0772c494aecf4cbfae61637d90a3e62460da23d35a33ea869d5f77d3
MD5 fcd1d88bc6a15249e274b2366c3bd20a
BLAKE2b-256 8f158ad9cafb37b2528adc602d2bf7551e12be438e199d4bfa5665b024b1898b

See more details on using hashes here.

File details

Details for the file pdf_to_markdown_cli-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_to_markdown_cli-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f8cb20e461238a997be65a9873f2a7892b63e77692050a91287e88149ca6d3c2
MD5 3f502ad185e3109a5ef43654c9ddf55d
BLAKE2b-256 9973266032026a87eb1903adda071a6aa4779437f00fd4bca1b6556b78a16dab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page