Skip to main content

A Python library for dewarping/straightening/reformatting document images and PDFs

Project description

py-reform: PDF & Image Dewarping Library

A Python library for dewarping/straightening/reformatting document images and PDFs.

An example

Features

  • Dewarp/straighten single images
  • Process entire PDFs or selected pages
  • Return PIL images for further processing
  • Save results as images or PDFs
  • Progress tracking with tqdm
  • Flexible error handling
  • Automatic EXIF orientation handling
  • Multiple dewarping models

Installation

pip install py-reform

Quick Start

Process a Single Image

from py_reform import straighten

# Process a single image
straight_image = straighten("curved_page.jpg")
straight_image.save("straight_page.jpg")

Process a PDF

from py_reform import straighten, save_pdf

# Process a PDF (all pages)
straight_pages = straighten("document.pdf")

# Save processed pages as a new PDF
save_pdf(straight_pages, "straight_document.pdf")

Process Specific PDF Pages

# Process specific PDF pages
straight_pages = straighten("document.pdf", pages=[0, 2, 5])

Choose a Different Dewarping Model

By default we use UVDoc, which works for all sorts of problematic images. If you just need to rotate the image, though, use deskew instead.

# Use the rotation-based deskew model
straight_image = straighten("document.jpg", model="deskew")

# Use the UVDoc model with custom parameters
straight_image = straighten("document.jpg", model="uvdoc", device="cpu")

# Configure deskew model parameters
straight_image = straighten("document.jpg", model="deskew", max_angle=15.0, num_peaks=30)

Create Before/After Comparisons

from py_reform.utils import create_comparison

straight_image = straighten("curved_page.jpg")

# Create a side-by-side comparison
comparison = create_comparison(["curved_page.jpg", straight_image])
comparison.save("comparison.jpg")

Error Handling

# Default: stop on error
result = straighten("document.pdf", errors="raise") 
# Skip errors, log warning
result = straighten("document.pdf", errors="ignore")
# Use original on error with warning
result = straighten("document.pdf", errors="warn")   

Working with Image Orientation

The library automatically handles EXIF orientation data in JPEG files, ensuring that images are correctly oriented before processing. You can also use these utilities directly:

from py_reform.utils import open_image, auto_rotate_image
import PIL.Image

# Open an image with automatic orientation correction
img = open_image("photo.jpg")

# Or correct orientation of an already opened image
img = PIL.Image.open("photo.jpg")
img = auto_rotate_image(img)

Available Models

Examples

See examples/examples.py

Citation

The UVDoc model is based on original work by Floor Verhoeven, Tanguy Magne, and Olga Sorkine-Hornung. If you use py-reform with the UVDoc model, please consider citing their work:

@inproceedings{UVDoc,
title={{UVDoc}: Neural Grid-based Document Unwarping},
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
booktitle = {SIGGRAPH ASIA, Technical Papers},
year = {2023},
url={https://doi.org/10.1145/3610548.3618174}
}

Original UVDoc repository: https://github.com/tanguymagne/UVDoc/

Anything else??

I'm pretty sure I wrote about two lines of code for this, the rest was all Cursor and Claude 3.7 Sonnet. My job was mostly making demands around pathlib and ditching OpenCV.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_reform-0.1.3.tar.gz (54.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_reform-0.1.3-py3-none-any.whl (54.7 MB view details)

Uploaded Python 3

File details

Details for the file py_reform-0.1.3.tar.gz.

File metadata

  • Download URL: py_reform-0.1.3.tar.gz
  • Upload date:
  • Size: 54.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for py_reform-0.1.3.tar.gz
Algorithm Hash digest
SHA256 2c3629b2af03d62f213f82b3a239cfad0709ca1c1107589d6967b2ae93a16dc5
MD5 8945c13e29c1e3a5d8d89e755aaae423
BLAKE2b-256 40006040551d90c942dabcb5055d49c2d4fa20f5fe0b7d679b0946fd3ff0f67a

See more details on using hashes here.

File details

Details for the file py_reform-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: py_reform-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 54.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for py_reform-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 baae49d3f1ca1f9a3424e79ec2cdca1007b0112de7b51db1f6dd118ea6d14f49
MD5 dd04491b0ed239249ee2d35248b59377
BLAKE2b-256 64755aa65d016db6271d9d55975b035a178fccc83042088e5bafb9b4c53d0a43

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page