Skip to main content

DocMax — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DocMax — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install DocMax

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
DocMax merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
DocMax split report.pdf

# Compress (uses Ghostscript)
DocMax compress large.pdf --preset ebook

# Rotate pages
DocMax rotate file.pdf 90

# Extract page range
DocMax pages file.pdf 1-5

# Watermark
DocMax watermark file.pdf logo.png

# Encrypt / Decrypt
DocMax encrypt file.pdf
DocMax decrypt protected.pdf

OCR

# OCR an image
DocMax ocr scan.png

# OCR a PDF
DocMax ocr scan.pdf

# Output as JSON or Markdown
DocMax ocr scan.pdf --fmt json
DocMax ocr scan.pdf --fmt md

# Multi-language OCR
DocMax ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
DocMax searchable scan.pdf

# Batch OCR an entire folder
DocMax batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
DocMax convert report.docx pdf

# Convert Markdown → HTML
DocMax convert notes.md html

# Combine images into a PDF
DocMax img2pdf scans/

# Export PDF pages as images
DocMax pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
DocMax text report.pdf

# Extract embedded images
DocMax images report.pdf

# Show / save metadata
DocMax metadata report.pdf
DocMax metadata report.pdf -o meta.json

# Extract tables
DocMax tables invoice.pdf --fmt xlsx
DocMax tables invoice.pdf --fmt csv
DocMax tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
DocMax enhance scan.png

# Fix skewed scans
DocMax deskew scan.png

# Remove noise
DocMax denoise scan.png

# Resize
DocMax resize photo.png --width 800
DocMax resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
DocMax preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
DocMax batch ./documents --ocr --workers 8

# Batch compress
DocMax batch ./pdfs --compress

# Batch convert to markdown
DocMax batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
DocMax watch ./incoming --ocr

# Auto-make-searchable
DocMax watch ./scans --searchable

# Auto-compress
DocMax watch ./uploads --compress

Project Structure

DocMax/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DocMax Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmax-2.0.1.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmax-2.0.1-py3-none-any.whl (43.5 kB view details)

Uploaded Python 3

File details

Details for the file docmax-2.0.1.tar.gz.

File metadata

  • Download URL: docmax-2.0.1.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-2.0.1.tar.gz
Algorithm Hash digest
SHA256 67662fcda70302aa006d7ec3f7ad19177adda6a8aff7548394efd6a36621d2f5
MD5 5c38198fb7f244ebaff98f1b88dbaac6
BLAKE2b-256 bb41d09a90baa3167bd5f889001bf6c6ecef2e4e1d3c80766d3f5225bebc3624

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-2.0.1.tar.gz:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docmax-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: docmax-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 43.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18d4fd6f7673316fe80432c3aed858cecf360e8d76e5452c1f5a0f358c729d8e
MD5 502cf91afbfb35c0d130544d9eeec19d
BLAKE2b-256 3420d2270008420a7c063e80b8ba528c00f6e4da31405c776a86dd19c3c6a54a

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-2.0.1-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page