DForge — Unified Document Processing CLI. Forge your documents from your terminal.
Project description
DForge — Forge your documents from your terminal.
A unified, offline-first Python CLI for all your document processing needs.
Installation
pip install dforge
External Dependencies
| Tool | Purpose | Install |
|---|---|---|
| Tesseract OCR | OCR engine | Install guide |
| Ghostscript | PDF compression | ghostscript.com |
| Pandoc | Document conversion | pandoc.org |
| Poppler | PDF → image (pdf2image) | apt install poppler-utils / brew install poppler |
Quick Reference
PDF Operations
# Merge PDFs
dforge merge a.pdf b.pdf c.pdf -o merged.pdf
# Split into pages
dforge split report.pdf
# Compress (uses Ghostscript)
dforge compress large.pdf --preset ebook
# Rotate pages
dforge rotate file.pdf 90
# Extract page range
dforge pages file.pdf 1-5
# Watermark
dforge watermark file.pdf logo.png
# Encrypt / Decrypt
dforge encrypt file.pdf
dforge decrypt protected.pdf
OCR
# OCR an image
dforge ocr scan.png
# OCR a PDF
dforge ocr scan.pdf
# Output as JSON or Markdown
dforge ocr scan.pdf --fmt json
dforge ocr scan.pdf --fmt md
# Multi-language OCR
dforge ocr scan.png --lang eng+hin
# Make a scanned PDF searchable
dforge searchable scan.pdf
# Batch OCR an entire folder
dforge batch-ocr invoices/
Document Conversion
# Convert DOCX → PDF
dforge convert report.docx pdf
# Convert Markdown → HTML
dforge convert notes.md html
# Combine images into a PDF
dforge img2pdf scans/
# Export PDF pages as images
dforge pdf2img report.pdf --dpi 300 --fmt png
Content Extraction
# Extract text
dforge text report.pdf
# Extract embedded images
dforge images report.pdf
# Show / save metadata
dforge metadata report.pdf
dforge metadata report.pdf -o meta.json
# Extract tables
dforge tables invoice.pdf --fmt xlsx
dforge tables invoice.pdf --fmt csv
dforge tables invoice.pdf --fmt json
Image Processing
# Enhance (contrast + sharpness)
dforge enhance scan.png
# Fix skewed scans
dforge deskew scan.png
# Remove noise
dforge denoise scan.png
# Resize
dforge resize photo.png --width 800
dforge resize photo.png --scale 0.5
# Full OCR preprocessing pipeline
dforge preprocess scan.png
Batch Processing
# Batch OCR with 8 workers
dforge batch ./documents --ocr --workers 8
# Batch compress
dforge batch ./pdfs --compress
# Batch convert to markdown
dforge batch ./docs --convert md
Watch Mode
# Auto-OCR new files dropped into a folder
dforge watch ./incoming --ocr
# Auto-make-searchable
dforge watch ./scans --searchable
# Auto-compress
dforge watch ./uploads --compress
Project Structure
dforge/
├── cli.py ← Typer CLI entry point
├── config.py ← Global configuration
├── utils.py ← Shared utilities
├── pdf/
│ └── operations.py ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│ └── engine.py ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│ └── converter.py ← convert, images_to_pdf, pdf_to_images
├── extract/
│ └── extractor.py ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│ └── processor.py ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│ └── processor.py ← parallel batch processing
└── watch/
└── watcher.py ← watchdog-based directory monitor
Supported Formats
| Category | Formats |
|---|---|
| Input documents | PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB |
| Input images | PNG, JPG/JPEG, TIFF/TIF, BMP, WebP |
| OCR output | TXT, JSON, Markdown |
| Table export | CSV, XLSX, JSON |
| Image export | PNG, JPEG, TIFF |
License
MIT License — DForge Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dforge_cli-1.0.4.tar.gz.
File metadata
- Download URL: dforge_cli-1.0.4.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1200b3ef5f67f16793b82b22e12e3e3373c8acaa168bf25affadd016d15be68e
|
|
| MD5 |
d34c97ba23eb692ccf1b925b8a3d2e31
|
|
| BLAKE2b-256 |
b3a6beffda5cfcb06e20989c1669c40bfa279f2e4c957491f39f9f83930b83dc
|
Provenance
The following attestation bundles were made for dforge_cli-1.0.4.tar.gz:
Publisher:
publish.yml on megabyte44/DFORGE
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dforge_cli-1.0.4.tar.gz -
Subject digest:
1200b3ef5f67f16793b82b22e12e3e3373c8acaa168bf25affadd016d15be68e - Sigstore transparency entry: 1710048713
- Sigstore integration time:
-
Permalink:
megabyte44/DFORGE@dd58f3ce2e562828314ce2d4153420655067b906 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/megabyte44
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd58f3ce2e562828314ce2d4153420655067b906 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dforge_cli-1.0.4-py3-none-any.whl.
File metadata
- Download URL: dforge_cli-1.0.4-py3-none-any.whl
- Upload date:
- Size: 39.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dd0e90da69cd4b912987176b2a38b6213d0ceee8cf92934a257a02137a93bc6
|
|
| MD5 |
41e5bcdfbcd4ffaf2a745915bdcdd751
|
|
| BLAKE2b-256 |
722729e34fcf7f1bb3b3d7732ae3bcf2c14c2813caf79e0288abe8434b3c96f1
|
Provenance
The following attestation bundles were made for dforge_cli-1.0.4-py3-none-any.whl:
Publisher:
publish.yml on megabyte44/DFORGE
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dforge_cli-1.0.4-py3-none-any.whl -
Subject digest:
6dd0e90da69cd4b912987176b2a38b6213d0ceee8cf92934a257a02137a93bc6 - Sigstore transparency entry: 1710048754
- Sigstore integration time:
-
Permalink:
megabyte44/DFORGE@dd58f3ce2e562828314ce2d4153420655067b906 -
Branch / Tag:
refs/tags/v1.0.4 - Owner: https://github.com/megabyte44
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd58f3ce2e562828314ce2d4153420655067b906 -
Trigger Event:
release
-
Statement type: