Skip to main content

DForge — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DForge — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install dforge

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
dforge merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
dforge split report.pdf

# Compress (uses Ghostscript)
dforge compress large.pdf --preset ebook

# Rotate pages
dforge rotate file.pdf 90

# Extract page range
dforge pages file.pdf 1-5

# Watermark
dforge watermark file.pdf logo.png

# Encrypt / Decrypt
dforge encrypt file.pdf
dforge decrypt protected.pdf

OCR

# OCR an image
dforge ocr scan.png

# OCR a PDF
dforge ocr scan.pdf

# Output as JSON or Markdown
dforge ocr scan.pdf --fmt json
dforge ocr scan.pdf --fmt md

# Multi-language OCR
dforge ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
dforge searchable scan.pdf

# Batch OCR an entire folder
dforge batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
dforge convert report.docx pdf

# Convert Markdown → HTML
dforge convert notes.md html

# Combine images into a PDF
dforge img2pdf scans/

# Export PDF pages as images
dforge pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
dforge text report.pdf

# Extract embedded images
dforge images report.pdf

# Show / save metadata
dforge metadata report.pdf
dforge metadata report.pdf -o meta.json

# Extract tables
dforge tables invoice.pdf --fmt xlsx
dforge tables invoice.pdf --fmt csv
dforge tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
dforge enhance scan.png

# Fix skewed scans
dforge deskew scan.png

# Remove noise
dforge denoise scan.png

# Resize
dforge resize photo.png --width 800
dforge resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
dforge preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
dforge batch ./documents --ocr --workers 8

# Batch compress
dforge batch ./pdfs --compress

# Batch convert to markdown
dforge batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
dforge watch ./incoming --ocr

# Auto-make-searchable
dforge watch ./scans --searchable

# Auto-compress
dforge watch ./uploads --compress

Project Structure

dforge/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DForge Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dforge_cli-1.0.1.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dforge_cli-1.0.1-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file dforge_cli-1.0.1.tar.gz.

File metadata

  • Download URL: dforge_cli-1.0.1.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9633715e3f7ab04effc8463875e1066307973b86231343a83921d8cea4aae2b2
MD5 ea3a6feb24a258187c69daf30a2c8c65
BLAKE2b-256 1457d85065c0ba5eb7a60950a7a996d93f8e0e7ceecbc0c1111dfc97da6fd3fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.1.tar.gz:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dforge_cli-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dforge_cli-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 39.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fae3291e3dad0032d7c841d85eeffa9643018418af2010810f36b120a772a603
MD5 b2bbb1c47a9506f30fdb7d3f020332c2
BLAKE2b-256 a87f2c104063b835f13b4dbd511876e1a33c42d94a5561db807a7d2b383ddc15

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.1-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page