Skip to main content

DForge — Unified Document Processing CLI. Forge your documents from your terminal.

Project description

DForge — Forge your documents from your terminal.

A unified, offline-first Python CLI for all your document processing needs.


Installation

pip install dforge

External Dependencies

Tool Purpose Install
Tesseract OCR OCR engine Install guide
Ghostscript PDF compression ghostscript.com
Pandoc Document conversion pandoc.org
Poppler PDF → image (pdf2image) apt install poppler-utils / brew install poppler

Quick Reference

PDF Operations

# Merge PDFs
dforge merge a.pdf b.pdf c.pdf -o merged.pdf

# Split into pages
dforge split report.pdf

# Compress (uses Ghostscript)
dforge compress large.pdf --preset ebook

# Rotate pages
dforge rotate file.pdf 90

# Extract page range
dforge pages file.pdf 1-5

# Watermark
dforge watermark file.pdf logo.png

# Encrypt / Decrypt
dforge encrypt file.pdf
dforge decrypt protected.pdf

OCR

# OCR an image
dforge ocr scan.png

# OCR a PDF
dforge ocr scan.pdf

# Output as JSON or Markdown
dforge ocr scan.pdf --fmt json
dforge ocr scan.pdf --fmt md

# Multi-language OCR
dforge ocr scan.png --lang eng+hin

# Make a scanned PDF searchable
dforge searchable scan.pdf

# Batch OCR an entire folder
dforge batch-ocr invoices/

Document Conversion

# Convert DOCX → PDF
dforge convert report.docx pdf

# Convert Markdown → HTML
dforge convert notes.md html

# Combine images into a PDF
dforge img2pdf scans/

# Export PDF pages as images
dforge pdf2img report.pdf --dpi 300 --fmt png

Content Extraction

# Extract text
dforge text report.pdf

# Extract embedded images
dforge images report.pdf

# Show / save metadata
dforge metadata report.pdf
dforge metadata report.pdf -o meta.json

# Extract tables
dforge tables invoice.pdf --fmt xlsx
dforge tables invoice.pdf --fmt csv
dforge tables invoice.pdf --fmt json

Image Processing

# Enhance (contrast + sharpness)
dforge enhance scan.png

# Fix skewed scans
dforge deskew scan.png

# Remove noise
dforge denoise scan.png

# Resize
dforge resize photo.png --width 800
dforge resize photo.png --scale 0.5

# Full OCR preprocessing pipeline
dforge preprocess scan.png

Batch Processing

# Batch OCR with 8 workers
dforge batch ./documents --ocr --workers 8

# Batch compress
dforge batch ./pdfs --compress

# Batch convert to markdown
dforge batch ./docs --convert md

Watch Mode

# Auto-OCR new files dropped into a folder
dforge watch ./incoming --ocr

# Auto-make-searchable
dforge watch ./scans --searchable

# Auto-compress
dforge watch ./uploads --compress

Project Structure

dforge/
├── cli.py           ← Typer CLI entry point
├── config.py        ← Global configuration
├── utils.py         ← Shared utilities
├── pdf/
│   └── operations.py  ← merge, split, compress, rotate, pages, watermark, encrypt, decrypt
├── ocr/
│   └── engine.py      ← ocr_image, ocr_pdf, make_searchable_pdf, batch_ocr
├── convert/
│   └── converter.py   ← convert, images_to_pdf, pdf_to_images
├── extract/
│   └── extractor.py   ← extract_text, extract_images, extract_metadata, extract_tables
├── image/
│   └── processor.py   ← enhance, deskew, denoise, resize, preprocess_for_ocr
├── batch/
│   └── processor.py   ← parallel batch processing
└── watch/
    └── watcher.py     ← watchdog-based directory monitor

Supported Formats

Category Formats
Input documents PDF, DOCX, ODT, MD, HTML, TXT, RST, EPUB
Input images PNG, JPG/JPEG, TIFF/TIF, BMP, WebP
OCR output TXT, JSON, Markdown
Table export CSV, XLSX, JSON
Image export PNG, JPEG, TIFF

License

MIT License — DForge Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dforge_cli-1.0.4.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dforge_cli-1.0.4-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file dforge_cli-1.0.4.tar.gz.

File metadata

  • Download URL: dforge_cli-1.0.4.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.4.tar.gz
Algorithm Hash digest
SHA256 1200b3ef5f67f16793b82b22e12e3e3373c8acaa168bf25affadd016d15be68e
MD5 d34c97ba23eb692ccf1b925b8a3d2e31
BLAKE2b-256 b3a6beffda5cfcb06e20989c1669c40bfa279f2e4c957491f39f9f83930b83dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.4.tar.gz:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dforge_cli-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: dforge_cli-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dforge_cli-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6dd0e90da69cd4b912987176b2a38b6213d0ceee8cf92934a257a02137a93bc6
MD5 41e5bcdfbcd4ffaf2a745915bdcdd751
BLAKE2b-256 722729e34fcf7f1bb3b3d7732ae3bcf2c14c2813caf79e0288abe8434b3c96f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dforge_cli-1.0.4-py3-none-any.whl:

Publisher: publish.yml on megabyte44/DFORGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page