DocMax — Unified Document Processing CLI. Forge your documents from your terminal.
Project description
DocMax — Forge your documents from your terminal.
A unified, offline-first Python CLI for all your document processing needs.
Installation
pip install DocMax
External Dependencies
| Tool | Purpose | Install |
|---|---|---|
| Tesseract OCR | OCR engine | Install guide |
| Ghostscript | PDF compression | ghostscript.com |
| Pandoc | Document conversion | pandoc.org |
| Poppler | PDF → image (pdf2image) | apt install poppler-utils / brew install poppler |
Quick Reference
PDF Operations
# Merge PDFs
DocMax merge a.pdf b.pdf c.pdf -o merged.pdf
# Split into pages
DocMax split report.pdf
# Compress (uses Ghostscript)
DocMax compress large.pdf --preset ebook
# Rotate pages
DocMax rotate file.pdf 90
# Extract page range
DocMax pages file.pdf 1-5
# Watermark
DocMax watermark file.pdf logo.png
# Encrypt / Decrypt
DocMax encrypt file.pdf
DocMax decrypt protected.pdf
OCR
# OCR an image
DocMax ocr scan.png
# OCR a PDF
DocMax ocr scan.pdf
# Output as JSON or Markdown
DocMax ocr scan.pdf --fmt json
DocMax ocr scan.pdf --fmt md
# Multi-language OCR
DocMax ocr scan.png --lang eng+hin
# Make a scanned PDF searchable
DocMax searchable scan.pdf
# Batch OCR an entire folder
DocMax batch-ocr invoices/
Document Conversion
# Convert DOCX → PDF
DocMax convert report.docx pdf
# Convert Markdown → HTML
DocMax convert notes.md html
# Combine images into a PDF
DocMax img2pdf scans/
# Export PDF pages as images
DocMax pdf2img report.pdf --dpi 300 --fmt png
Content Extraction
# Extract text
DocMax text report.pdf
# Extract embedded images
DocMax images report.pdf
# Show / save metadata
DocMax metadata report.pdf
DocMax metadata report.pdf -o meta.json
# Extract tables
DocMax tables invoice.pdf --fmt xlsx
DocMax tables invoice.pdf --fmt csv
DocMax tables invoice.pdf --fmt json
Image Processing
# Enhance (contrast + sharpness)
DocMax enhance scan.png
# Fix skewed scans
DocMax deskew scan.png
# Remove noise
DocMax denoise scan.png
# Resize
DocMax resize photo.png --width 800
DocMax resize photo.png --scale 0.5
# Full OCR preprocessing pipeline
DocMax preprocess scan.png
Batch Processing
# Batch OCR with 8 workers
DocMax batch ./documents --ocr --workers 8
# Batch compress
DocMax batch ./pdfs --compress
# Batch convert to markdown
DocMax batch ./docs --convert md
Watch Mode
# Auto-OCR new files dropped into a folder
DocMax watch ./incoming --ocr
# Auto-make-searchable
DocMax watch ./scans --searchable
# Auto-compress
DocMax watch ./uploads --compress
Project Structure
DocMax/
├── cli.py ← Typer CLI entry point
├── config.py ← Global configuration
├── utils.py ← Shared utilities
├── pdf/
│ └── operations.py ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│ └── engine.py ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│ └── converter.py ← convert, images_to_pdf, pdf_to_images
├── extract/
│ └── extractor.py ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│ └── processor.py ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│ └── processor.py ← parallel batch processing
└── watch/
└── watcher.py ← watchdog-based directory monitor
Supported Formats
| Category | Formats |
|---|---|
| Input documents | PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB |
| Input images | PNG, JPG/JPEG, TIFF/TIF, BMP, WebP |
| OCR output | TXT, JSON, Markdown |
| Table export | CSV, XLSX, JSON |
| Image export | PNG, JPEG, TIFF |
License
MIT License — DocMax Contributors
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docmax-1.1.0.tar.gz.
File metadata
- Download URL: docmax-1.1.0.tar.gz
- Upload date:
- Size: 32.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a09f0240d231aa344e2412ac2eeefefe1c11484d111ffd0f9026e74e042a977
|
|
| MD5 |
320d06e9995c8150e507c20956b9b031
|
|
| BLAKE2b-256 |
32b2f67f019b2851698ce2e6816950e689699cf1ea5eabedbd005d921c6e0908
|
Provenance
The following attestation bundles were made for docmax-1.1.0.tar.gz:
Publisher:
publish.yml on megabyte44/DocMax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docmax-1.1.0.tar.gz -
Subject digest:
6a09f0240d231aa344e2412ac2eeefefe1c11484d111ffd0f9026e74e042a977 - Sigstore transparency entry: 1723429532
- Sigstore integration time:
-
Permalink:
megabyte44/DocMax@70d19c98d026d3a05e9e46a08d4feccbc51cab79 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/megabyte44
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@70d19c98d026d3a05e9e46a08d4feccbc51cab79 -
Trigger Event:
release
-
Statement type:
File details
Details for the file docmax-1.1.0-py3-none-any.whl.
File metadata
- Download URL: docmax-1.1.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a75904a3731f87d528b5c835a9e73ea1bf34858451e213457489a9e74eadddad
|
|
| MD5 |
73f4de7db214d4dec578a487192ba85a
|
|
| BLAKE2b-256 |
643c90f00e702146fe148b2fe1f9df336c4f5a99111c953d460f10df98410307
|
Provenance
The following attestation bundles were made for docmax-1.1.0-py3-none-any.whl:
Publisher:
publish.yml on megabyte44/DocMax
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
docmax-1.1.0-py3-none-any.whl -
Subject digest:
a75904a3731f87d528b5c835a9e73ea1bf34858451e213457489a9e74eadddad - Sigstore transparency entry: 1723429673
- Sigstore integration time:
-
Permalink:
megabyte44/DocMax@70d19c98d026d3a05e9e46a08d4feccbc51cab79 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/megabyte44
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@70d19c98d026d3a05e9e46a08d4feccbc51cab79 -
Trigger Event:
release
-
Statement type: