Skip to main content

A GUI tool for deskewing scanned PDF documents using PyQt6 and OpenCV

Project description

PDF Deskew Tool

中文文档

Overview

PDF Deskew Tool is a graphical user interface (GUI) application designed to correct skewed pages in scanned PDF documents. It leverages PyMuPDF, OpenCV, and other powerful libraries to process each page of a PDF and generate a corrected version with improved readability and visual balance. The tool supports multi-language interfaces, theme switching, file drag-and-drop, and detailed progress feedback, aiming to provide a simple and efficient user experience.

Features

  • Multi-language Support: Supports both Chinese and English interfaces with easy language switching.
  • Drag-and-Drop File Selection: Simply drag and drop your PDF files for easy selection.
  • Batch Processing: Process multiple PDF files simultaneously to improve work efficiency.
  • Real-time Progress Feedback: Display progress bars and percentages to track processing status.
  • Theme Switching: Offers multiple interface themes for personalized appearance.
  • Customizable Settings:
    • DPI Configuration: Customize rendering DPI to meet different quality requirements.
    • Background Color Selection: Choose or customize background colors to optimize correction results.
    • Image Enhancement: Remove watermarks, enhance contrast, denoise, and sharpen images.
  • Logging: Records important information and errors during processing for debugging and user feedback.
  • Intuitive Interface: User-friendly design with icons and tooltips for enhanced usability.

Installation

Recommended: Using uv

uv tool install pdf-deskew

This will automatically create two executable commands: pdf-deskew (GUI) and pdf-deskew-cli (CLI).

Alternative: Using pip

pip install pdf-deskew

From Source (Development)

git clone https://github.com/tinnci/pdf_deskew.git
cd pdf_deskew

# Create virtual environment
uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install in development mode
uv pip install -e .

Dependencies

The tool automatically installs the following dependencies:

  • PyQt6 (>=6.7.1): GUI framework
  • PyMuPDF (>=1.24.13): PDF processing
  • OpenCV (>=4.10.0.84): Image processing
  • Pillow (>=11.0.0): Image manipulation
  • numpy (>=2.1.2): Numerical computing
  • deskew (>=1.5.1): Skew detection
  • qt-material (>=2.14): Theme support
  • tqdm (>=4.66.6): Progress bars

Usage

GUI Application

Start the application:

pdf-deskew

Interface Guide:

  1. File Selection:

    • Input PDF: Click "Browse" button or drag-and-drop a PDF file
    • Output PDF: Specify save location (default: input_filename_deskewed.pdf)
  2. Processing Options:

    • Use Recommended Settings: DPI=300, white background
    • Custom Settings: Adjust DPI, background color, watermark removal, image enhancement
    • Image Processing:
      • Remove watermarks (Inpainting)
      • Enhance images (contrast, denoising, sharpening)
      • Convert to grayscale
  3. Language & Theme:

    • Switch between English and Chinese
    • Choose from multiple interface themes

Command-Line Tool

View help:

pdf-deskew-cli --help

Basic usage:

# Simple conversion
pdf-deskew-cli input.pdf

# Specify output
pdf-deskew-cli input.pdf -o output.pdf

# Custom DPI
pdf-deskew-cli input.pdf -d 600

# With enhancements
pdf-deskew-cli input.pdf --enhance --remove-watermark

# Change background
pdf-deskew-cli input.pdf --bg-color black

Command-line Arguments:

  • input: Input PDF file path (required)
  • -o, --output: Output file path (default: input_deskewed.pdf)
  • -d, --dpi: Rendering DPI, range 72-1200 (default: 300)
  • --bg-color: Background color, white or black (default: white)
  • --enhance: Enable image enhancement
  • --remove-watermark: Enable watermark removal
  • -v, --version: Show version number

System Requirements

  • Operating System: Windows, macOS, or Linux
  • Python: 3.12 or higher
  • Optional: uv package manager (recommended)

Notes

  • Special Characters in Paths: If your file paths contain spaces or special characters, use quotes to avoid errors.
  • Temporary Files: The application creates a temporary folder for intermediate images, which is automatically cleaned up after processing.
  • Logging: Processing logs are recorded in pdf_deskew.log for debugging purposes.
  • Theme Switching: Theme changes take effect immediately without requiring application restart.

Development

To contribute to this project:

  1. Clone the Repository:

    git clone https://github.com/tinnci/pdf_deskew.git
    cd pdf_deskew
    
  2. Set Up Environment:

    uv venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -e .
    
  3. Run Tests:

    pytest
    
  4. Submit Changes:

    git add .
    git commit -m "Description of changes"
    git push origin your-branch
    

License

This project is licensed under the MIT License. You are free to use and modify it.

Support

For issues or questions:


Thank you for using PDF Deskew Tool! If you find it useful, please give us a ⭐ on GitHub and share it with others who might benefit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_deskew-0.1.1.tar.gz (24.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_deskew-0.1.1-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_deskew-0.1.1.tar.gz.

File metadata

  • Download URL: pdf_deskew-0.1.1.tar.gz
  • Upload date:
  • Size: 24.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pdf_deskew-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a9824d18142a8cc246bc711ecd3ee0b7a523d66abcf8f5ade84a879afb1970da
MD5 7dd41020df54fb48a026864df4bdc023
BLAKE2b-256 78ee9841ec1fa6089044ef2dac420429dbc5afaae6d64aec65d09300474b54bd

See more details on using hashes here.

File details

Details for the file pdf_deskew-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pdf_deskew-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pdf_deskew-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a32ee0baf17f67ad89f805b224f7261892060e0d0fe076c61afc5c02d84eac73
MD5 54ab6b3fed39c343f96a2e822c426f55
BLAKE2b-256 1f8f3a0791d2331c0d3f019068d514c238ae1d5c9e19f4ac01916e81baa95c58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page