Skip to main content

DForge — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DForge — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install dforge

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
dforge merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
dforge split report.pdf

# Compress (uses Ghostscript)
dforge compress large.pdf --preset ebook

# Rotate pages
dforge rotate file.pdf 90

# Extract page range
dforge pages file.pdf 1-5

# Watermark
dforge watermark file.pdf logo.png

# Encrypt / Decrypt
dforge encrypt file.pdf
dforge decrypt protected.pdf

OCR

# OCR an image
dforge ocr scan.png

# OCR a PDF
dforge ocr scan.pdf

# Output as JSON or Markdown
dforge ocr scan.pdf --fmt json
dforge ocr scan.pdf --fmt md

# Multi-language OCR
dforge ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
dforge searchable scan.pdf

# Batch OCR an entire folder
dforge batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
dforge convert report.docx pdf

# Convert Markdown → HTML
dforge convert notes.md html

# Combine images into a PDF
dforge img2pdf scans/

# Export PDF pages as images
dforge pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
dforge text report.pdf

# Extract embedded images
dforge images report.pdf

# Show / save metadata
dforge metadata report.pdf
dforge metadata report.pdf -o meta.json

# Extract tables
dforge tables invoice.pdf --fmt xlsx
dforge tables invoice.pdf --fmt csv
dforge tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
dforge enhance scan.png

# Fix skewed scans
dforge deskew scan.png

# Remove noise
dforge denoise scan.png

# Resize
dforge resize photo.png --width 800
dforge resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
dforge preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
dforge batch ./documents --ocr --workers 8

# Batch compress
dforge batch ./pdfs --compress

# Batch convert to markdown
dforge batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
dforge watch ./incoming --ocr

# Auto-make-searchable
dforge watch ./scans --searchable

# Auto-compress
dforge watch ./uploads --compress

Project Structure

dforge/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DForge Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dforge_cli-1.0.2.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dforge_cli-1.0.2-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file dforge_cli-1.0.2.tar.gz.

File metadata

  • Download URL: dforge_cli-1.0.2.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e50fdd3f9d080bcfb1cf320bcd9aa19e8e03d70fa1c62fe8d9f288538862c98e
MD5 3d2033caaaa5c63371bdd49102062811
BLAKE2b-256 7c2e919d975bf34ad61d73fc55aceaf1bc16f2c91d0b4686d99da7e2d3a062ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.2.tar.gz:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dforge_cli-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dforge_cli-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 59f4a15d855bd90791501572a259b2adb33966913fd4be5de817d39e4ef407a9
MD5 ef736b9937ddfdaa5fe8a599d68a0620
BLAKE2b-256 26db6638e6ff909eebe3332f3a74decbf0501477ac044c3b581b5d5df0db7d1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.2-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page