Skip to main content

DocMax — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DocMax — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install DocMax

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
DocMax merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
DocMax split report.pdf

# Compress (uses Ghostscript)
DocMax compress large.pdf --preset ebook

# Rotate pages
DocMax rotate file.pdf 90

# Extract page range
DocMax pages file.pdf 1-5

# Watermark
DocMax watermark file.pdf logo.png

# Encrypt / Decrypt
DocMax encrypt file.pdf
DocMax decrypt protected.pdf

OCR

# OCR an image
DocMax ocr scan.png

# OCR a PDF
DocMax ocr scan.pdf

# Output as JSON or Markdown
DocMax ocr scan.pdf --fmt json
DocMax ocr scan.pdf --fmt md

# Multi-language OCR
DocMax ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
DocMax searchable scan.pdf

# Batch OCR an entire folder
DocMax batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
DocMax convert report.docx pdf

# Convert Markdown → HTML
DocMax convert notes.md html

# Combine images into a PDF
DocMax img2pdf scans/

# Export PDF pages as images
DocMax pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
DocMax text report.pdf

# Extract embedded images
DocMax images report.pdf

# Show / save metadata
DocMax metadata report.pdf
DocMax metadata report.pdf -o meta.json

# Extract tables
DocMax tables invoice.pdf --fmt xlsx
DocMax tables invoice.pdf --fmt csv
DocMax tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
DocMax enhance scan.png

# Fix skewed scans
DocMax deskew scan.png

# Remove noise
DocMax denoise scan.png

# Resize
DocMax resize photo.png --width 800
DocMax resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
DocMax preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
DocMax batch ./documents --ocr --workers 8

# Batch compress
DocMax batch ./pdfs --compress

# Batch convert to markdown
DocMax batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
DocMax watch ./incoming --ocr

# Auto-make-searchable
DocMax watch ./scans --searchable

# Auto-compress
DocMax watch ./uploads --compress

Project Structure

DocMax/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DocMax Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmax-1.1.0.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmax-1.1.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file docmax-1.1.0.tar.gz.

File metadata

  • Download URL: docmax-1.1.0.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-1.1.0.tar.gz
Algorithm Hash digest
SHA256 6a09f0240d231aa344e2412ac2eeefefe1c11484d111ffd0f9026e74e042a977
MD5 320d06e9995c8150e507c20956b9b031
BLAKE2b-256 32b2f67f019b2851698ce2e6816950e689699cf1ea5eabedbd005d921c6e0908

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-1.1.0.tar.gz:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docmax-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: docmax-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a75904a3731f87d528b5c835a9e73ea1bf34858451e213457489a9e74eadddad
MD5 73f4de7db214d4dec578a487192ba85a
BLAKE2b-256 643c90f00e702146fe148b2fe1f9df336c4f5a99111c953d460f10df98410307

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-1.1.0-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page