Skip to main content

DocMax — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DocMax — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install DocMax

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
DocMax merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
DocMax split report.pdf

# Compress (uses Ghostscript)
DocMax compress large.pdf --preset ebook

# Rotate pages
DocMax rotate file.pdf 90

# Extract page range
DocMax pages file.pdf 1-5

# Watermark
DocMax watermark file.pdf logo.png

# Encrypt / Decrypt
DocMax encrypt file.pdf
DocMax decrypt protected.pdf

OCR

# OCR an image
DocMax ocr scan.png

# OCR a PDF
DocMax ocr scan.pdf

# Output as JSON or Markdown
DocMax ocr scan.pdf --fmt json
DocMax ocr scan.pdf --fmt md

# Multi-language OCR
DocMax ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
DocMax searchable scan.pdf

# Batch OCR an entire folder
DocMax batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
DocMax convert report.docx pdf

# Convert Markdown → HTML
DocMax convert notes.md html

# Combine images into a PDF
DocMax img2pdf scans/

# Export PDF pages as images
DocMax pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
DocMax text report.pdf

# Extract embedded images
DocMax images report.pdf

# Show / save metadata
DocMax metadata report.pdf
DocMax metadata report.pdf -o meta.json

# Extract tables
DocMax tables invoice.pdf --fmt xlsx
DocMax tables invoice.pdf --fmt csv
DocMax tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
DocMax enhance scan.png

# Fix skewed scans
DocMax deskew scan.png

# Remove noise
DocMax denoise scan.png

# Resize
DocMax resize photo.png --width 800
DocMax resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
DocMax preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
DocMax batch ./documents --ocr --workers 8

# Batch compress
DocMax batch ./pdfs --compress

# Batch convert to markdown
DocMax batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
DocMax watch ./incoming --ocr

# Auto-make-searchable
DocMax watch ./scans --searchable

# Auto-compress
DocMax watch ./uploads --compress

Project Structure

DocMax/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DocMax Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmax-1.1.1.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docmax-1.1.1-py3-none-any.whl (44.7 kB view details)

Uploaded Python 3

File details

Details for the file docmax-1.1.1.tar.gz.

File metadata

  • Download URL: docmax-1.1.1.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-1.1.1.tar.gz
Algorithm Hash digest
SHA256 dbe10533bc19caafa3f5c5d820da29871eac60769676284b04412d03bd4b1fc8
MD5 63e1efaabdab6a0db404bd0a85ae5fdc
BLAKE2b-256 c1b77ce11e7d617f3f292116cf075cb1d119467465586680ff1e78a6ae428d1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-1.1.1.tar.gz:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docmax-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: docmax-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 44.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ad6b6315d65cb4bf135a7573cd7807914c42fd964b86d65ccc30e2ace9ebaa2e
MD5 9df793d64541ffe9075bab2275abf319
BLAKE2b-256 82f4aecd37f06ee66942a6ce5f6b06ad3bbedfe3e7ec4f5e92757cc975ad593a

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-1.1.1-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page