DocMax — Unified Document Processing CLI. Forge your documents from your terminal.

These details have not been verified by PyPI

Project description

◆ DocMax

Forge your documents from the terminal.

DocMax Banner

DocMax is an all-in-one, offline-first document processing CLI published on PyPI.
Merge PDFs, run OCR, convert formats, batch-process folders, and more — all from a single, beautiful terminal interface.

Installation • Usage • Features • Screenshots • Contributing

✨ Feature Gallery

Main Menu _{Launch with docmax — arrow-key navigation, guided workflows for every tool}
PDF Tools _{8 PDF operations — all interactive and guided}	Compression Result _{Ghostscript compression — real before/after sizes shown}
OCR Workflow _{4-step preprocessing pipeline before Tesseract OCR}	Batch OCR _{Multi-threaded batch processing with Rich progress bar}
System Doctor _{docmax doctor — checks all external tools, shows paths and install status}

🚀 Features

📄 PDF Tools

Feature	Command
Merge multiple PDFs	`docmax merge a.pdf b.pdf -o out.pdf`
Split into pages	`docmax split report.pdf`
Compress (Ghostscript)	`docmax compress large.pdf --preset ebook`
Rotate pages	`docmax rotate file.pdf 90`
Extract page range	`docmax pages file.pdf 1-5`
Overlay watermark	`docmax watermark file.pdf logo.png`
Encrypt with password	`docmax encrypt file.pdf`
Decrypt	`docmax decrypt protected.pdf`

🔍 OCR

Feature	Command
OCR an image	`docmax ocr scan.png`
OCR a PDF	`docmax ocr scan.pdf`
Output as JSON or Markdown	`docmax ocr scan.pdf --fmt json`
Multi-language OCR	`docmax ocr scan.png --lang eng+hin`
Make scanned PDF searchable	`docmax searchable scan.pdf`
Batch OCR a folder	`docmax batch-ocr invoices/`

🔄 Document Conversion

Feature	Command
Markdown → PDF	`docmax convert notes.md pdf`
Markdown → DOCX	`docmax convert notes.md docx`
DOCX → PDF	`docmax convert report.docx pdf`
DOCX → Markdown	`docmax convert report.docx md`
Images → PDF	`docmax img2pdf scans/`
PDF → Images	`docmax pdf2img report.pdf --dpi 300 --fmt png`

📂 Content Extraction

Feature	Command
Extract text	`docmax text report.pdf`
Extract embedded images	`docmax images report.pdf`
Show / save metadata	`docmax metadata report.pdf -o meta.json`
Extract tables (CSV/XLSX/JSON)	`docmax tables invoice.pdf --fmt xlsx`

🖼 Image Processing

Feature	Command
Enhance (contrast + sharpness)	`docmax enhance scan.png`
Fix skewed scans (deskew)	`docmax deskew scan.png`
Remove noise	`docmax denoise scan.png`
Resize	`docmax resize photo.png --width 800`
Full OCR preprocessing pipeline	`docmax preprocess scan.png`

Interactive image tools (resize, crop, rotate, flip, convert format, watermark, remove background) are also available in the TUI.

⚡ Batch Processing & Watch Mode

Feature	Command
Batch OCR with workers	`docmax batch ./docs --ocr --workers 8`
Batch compress PDFs	`docmax batch ./pdfs --compress`
Batch convert to Markdown	`docmax batch ./docs --convert md`
Auto-OCR watched folder	`docmax watch ./incoming --ocr`
Auto-compress watched folder	`docmax watch ./uploads --compress`
Auto-make-searchable	`docmax watch ./scans --searchable`
Auto-preprocess images	`docmax watch ./images --preprocess`

⚙️ Setup & Diagnostics

docmax setup    # Auto-install external dependencies (Tesseract, Ghostscript, Pandoc, Poppler)
docmax doctor   # Check which tools are installed and configured

📦 Installation

Core (always works)

pip install docmax

Optional extras

Install only what you need:

# OCR support (Tesseract + pdf2image)
pip install "docmax[ocr]"

# Advanced image processing (OpenCV, rembg background removal)
pip install "docmax[image]"

# Table extraction from PDFs (pdfplumber, pandas, openpyxl)
pip install "docmax[tables]"

# Everything
pip install "docmax[full]"

External dependencies

Some features require system tools. Run docmax setup to auto-install them, or follow the manual links below.

Tool	Purpose	Auto-install
Tesseract OCR	OCR engine	✅
Ghostscript	PDF compression	✅
Pandoc	Document conversion	✅
Poppler	PDF → image rendering	✅

# Install & verify in two steps
docmax setup
docmax doctor

🖥 Usage

Interactive TUI

Launch the full interactive terminal UI with no arguments:

docmax

DocMax TUI

Navigate with arrow keys, select with Enter. Every tool section has its own guided workflow.

Command-Line Interface

DocMax also works as a traditional CLI — every feature is a subcommand:

# Show all commands
docmax --help

# Show version
docmax --version

📸 Screenshots

Main Menu

Animated: the full interactive menu on launch.

PDF Tools

PDF Tools Menu

Merge, split, compress, rotate, watermark, encrypt and decrypt — all guided.

OCR Workflow

OCR Demo

Animated: selecting a scanned PDF, choosing output format, and getting searchable text.

Compression Results

Compress Result

Side-by-side before/after size after Ghostscript compression.

Batch Processing

Batch OCR

Animated: batch OCR across a folder with a live progress bar.

System Doctor

Doctor Output

docmax doctor showing installed tool status and paths.

Adding screenshots: Place images under docs/images/ in the repository root.
Recommended filenames: banner.png, tui-main-menu.gif, pdf-tools-menu.png, ocr-demo.gif, compress-result.png, batch-ocr.gif, doctor-output.png, pdf-tools.png, ocr-tools.png, image-tools.png, conversion-tools.png, batch-tools.png, settings.png.
GIFs can be recorded with Terminalizer or VHS.

🗂 Project Structure

docmax/
├── cli.py                  ← Typer CLI entry point & dict-driven dispatch
├── menu.py                 ← All menu definitions (*_MENU dicts + menu functions)
├── config.py               ← Global defaults (DPI, presets, paths)
├── config_manager.py       ← Persistent config (~/.docmax/config.json)
├── banner.py               ← Rich ASCII banner
├── theme.py                ← Rich colour theme
├── loading.py              ← Spinner / Loader context manager
├── help.py                 ← Install-extras help panel
│
├── operations.py           ← PDF & image operations (merge, split, compress…)
├── engine.py               ← OCR engine (image OCR, PDF OCR, searchable PDF)
├── processor.py            ← Image preprocessing pipeline (enhance, deskew…)
├── converter.py            ← Document conversion (Pandoc, img2pdf, pdf2image)
├── extractor.py            ← Content extraction (text, images, metadata, tables)
├── batch.py                ← Parallel batch processing
├── watcher.py              ← Watchdog-based directory monitor
├── dependencies.py         ← Dependency checks & doctor
├── setup.py                ← Cross-platform dependency installer
└── utils.py                ← Shared utilities (abort, info, success…)

workflows/
├── __init__.py
├── common.py               ← Shared UI helpers (file picker, success/fail screens)
├── pdf.py                  ← All PDF tool workflows
├── ocr_tools.py            ← All OCR tool workflows
├── convert.py              ← All conversion workflows
├── extract.py              ← All extraction workflows
├── image.py                ← All image processing workflows
├── batch.py                ← Batch processing workflows
├── automation.py           ← Watch-folder automation workflows
└── settings.py             ← Settings, doctor, setup workflows

🤝 Contributing

Contributions, bug reports, and feature requests are welcome!

Fork the repository
Create a feature branch: git checkout -b feat/my-feature
Make your changes and add tests where appropriate
Open a pull request

Please keep PRs focused and describe what problem they solve.

📜 License

DocMax is released under the MIT License.
© Punith Naidu and DocMax Contributors.

PyPI · Issues · Discussions

Made with ♥ and Rich, Typer, and Questionary.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.0.4

Jun 9, 2026

2.0.3

Jun 8, 2026

2.0.1

Jun 6, 2026

1.1.1

Jun 5, 2026

1.1.0

Jun 4, 2026

1.0.5

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docmax-2.0.4.tar.gz (40.1 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

docmax-2.0.4-py3-none-any.whl (46.0 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file docmax-2.0.4.tar.gz.

File metadata

Download URL: docmax-2.0.4.tar.gz
Upload date: Jun 9, 2026
Size: 40.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-2.0.4.tar.gz
Algorithm	Hash digest
SHA256	`70e417a5d88e99e610f43c58e5eb3336723ed4ae6b355530c2bad54ad84bbe92`
MD5	`1e2e2dbbec35031c920c3747ef7b3be4`
BLAKE2b-256	`4f12cad7c7295c25b6c293f69a4faf8d4f4b77ed4e75c9c75a5610db8c9b2c12`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-2.0.4.tar.gz:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docmax-2.0.4.tar.gz
- Subject digest: 70e417a5d88e99e610f43c58e5eb3336723ed4ae6b355530c2bad54ad84bbe92
- Sigstore transparency entry: 1762085068
- Sigstore integration time: Jun 9, 2026
Source repository:
- Permalink: megabyte44/DocMax@65326bb820dfcf96c21854bbb105c470dc67689a
- Branch / Tag: refs/tags/v2.0.4
- Owner: https://github.com/megabyte44
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@65326bb820dfcf96c21854bbb105c470dc67689a
- Trigger Event: release

File details

Details for the file docmax-2.0.4-py3-none-any.whl.

File metadata

Download URL: docmax-2.0.4-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 46.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docmax-2.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f50437114e722b4bde235c3c2a9e88ae60ad445c676abf2acbec961226debe55`
MD5	`c974b00f7fb0b5648e5abfd16bddcc2d`
BLAKE2b-256	`49d559fe2c3f0489f045e8ae5f6fa09298201e4616b0144ab55851c12fa12709`

See more details on using hashes here.

Provenance

The following attestation bundles were made for docmax-2.0.4-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DocMax

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: docmax-2.0.4-py3-none-any.whl
- Subject digest: f50437114e722b4bde235c3c2a9e88ae60ad445c676abf2acbec961226debe55
- Sigstore transparency entry: 1762085262
- Sigstore integration time: Jun 9, 2026
Source repository:
- Permalink: megabyte44/DocMax@65326bb820dfcf96c21854bbb105c470dc67689a
- Branch / Tag: refs/tags/v2.0.4
- Owner: https://github.com/megabyte44
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@65326bb820dfcf96c21854bbb105c470dc67689a
- Trigger Event: release

docmax 2.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

◆ DocMax

✨ Feature Gallery

🚀 Features

📦 Installation

Core (always works)

Optional extras

External dependencies

🖥 Usage

Interactive TUI

Command-Line Interface

📸 Screenshots

Main Menu

PDF Tools

OCR Workflow

Compression Results

Batch Processing

System Doctor

🗂 Project Structure

🤝 Contributing

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance