Skip to main content

Automatically deskew and turn upright every page of a PDF (jdeskew + Tesseract OSD).

Project description

Auto-Rotate

CI PyPI Python License: BSD-3-Clause

Automatically deskew and turn upright every page of a PDF — CLI, Python library, and a cross-platform desktop app.

Each page is rasterized, snapped to its correct cardinal orientation (0/90/180/270°) with Tesseract's orientation-and-script detection, then fine-skew corrected with jdeskew (FFT + radial projection), and reassembled into a new PDF. Optionally, a searchable text layer is added with OCRmyPDF.

PDF page
  → rasterize @ DPI (pypdfium2)
  → cardinal orientation snap   (Tesseract OSD, lossless np.rot90)   [--orient, default on]
  → fine-skew correction ±15°   (jdeskew, white-filled corners)
  → binarize to black-on-white  (Sauvola local threshold)            [--clean, default off]
  → rebuild PDF (img2pdf)
  → searchable text layer       (OCRmyPDF)                           [--ocr, default off]

Install

Desktop app (no Python needed) — download the installer for your OS from the latest release: Windows .msi, macOS .dmg (Apple Silicon, macOS 14+), or Linux .deb/.rpm/Flatpak. (Until the app is code-signed, macOS needs a right-click → Open on first launch, and Windows SmartScreen needs "Run anyway".)

CLI / library via pipx or pip:

pipx install auto-rotate            # the `auto-rotate` command
pipx install "auto-rotate[ocr]"     # + the `--ocr` capability
pip install auto-rotate             # as a library

From source with uv:

uv sync                  # core
uv sync --all-extras     # core + gui + ocr + dev tools

System binaries

Feature Requires
--orient (default) the tesseract binary on PATH
--ocr ocrmypdf (the [ocr] extra) + tesseract
--no-orient nothing — pure Python, no system binaries

On macOS: brew install tesseract ocrmypdf.

Usage

auto-rotate input.pdf output.pdf                  # cardinal orientation + deskew (image-only)
auto-rotate input.pdf output.pdf --no-orient      # deskew only, no Tesseract needed
auto-rotate input.pdf output.pdf --clean          # + pure black-on-white (great for scans)
auto-rotate input.pdf output.pdf --ocr            # + searchable text layer
auto-rotate input.pdf output.pdf --dpi 400 -v     # higher render DPI, log per-page corrections
positional: input, output
--dpi N            rasterization DPI (default 300)
--orient/--no-orient   cardinal-orientation detection (default: on)
--angle-max DEG    max fine-skew angle searched (default 15.0)
--clean            binarize to pure black-on-white (flattens scan background)
--ocr              add a searchable text layer (needs ocrmypdf)
-v, --verbose      log the correction applied to each page

Desktop app

auto-rotate-gui     # if installed via pip/pipx with the [gui] extra

Add one or more PDFs, toggle orient / clean / OCR, pick a DPI, and Run; each file is written next to its source as <name> - upright.pdf, with per-page progress in the log. Orientation and OCR controls are enabled only when Tesseract / OCRmyPDF are detected.

Library

from pathlib import Path
from auto_rotate import deskew_pdf

results = deskew_pdf(Path("scan.pdf"), Path("upright.pdf"), dpi=300, orient=True)
for r in results:
    print(f"page {r.index + 1}: cardinal {r.cardinal}° + skew {r.skew:.2f}°")

Notes & limitations

  • The output is rasterized. Without --ocr it is image-only (no selectable text); --ocr restores a searchable layer via OCR.
  • jdeskew only resolves tilt within ±--angle-max (default 15°). Gross rotation (sideways/upside-down pages) is handled by the orientation stage, so keep --orient on for arbitrary inputs.
  • Tesseract OSD needs a reasonable amount of text to lock onto orientation; on blank or text-sparse pages it falls back to leaving the page as-is (0°).
  • --clean uses a local (Sauvola) threshold, which keeps thin faint strokes (e.g. music staff lines) connected far better than a global threshold. It still cannot recover ink that is essentially absent from the scan — very faint lines may stay broken; raising the render --dpi is the best lever there.

Development

uv run ruff check . && uv run ruff format --check .
uv run pyright
uv run pytest                  # unit suite (system binaries mocked)
uv run pytest -m integration   # end-to-end tests against real tesseract/ocrmypdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_rotate-0.1.0.tar.gz (241.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_rotate-0.1.0-py3-none-any.whl (244.5 kB view details)

Uploaded Python 3

File details

Details for the file auto_rotate-0.1.0.tar.gz.

File metadata

  • Download URL: auto_rotate-0.1.0.tar.gz
  • Upload date:
  • Size: 241.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auto_rotate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d8a9e4ec76a17980c15e7589a9f478024a0c5aeaa90adf7b54dfa1e433d3c215
MD5 0799216fa90a05414f3ca5c7259e3213
BLAKE2b-256 38ce843ef4c9b83bcaa4420add77419e78e1c5ee9ab33968dc8895f9c58bf94b

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_rotate-0.1.0.tar.gz:

Publisher: release.yml on gedejong/auto-rotate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file auto_rotate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: auto_rotate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 244.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auto_rotate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89ab5648171dec97a2c2887d5eb07e8f610d4b625ec2669b98d974cc2be1fd9f
MD5 f5f2b12d43518b50bffc8771d954cd80
BLAKE2b-256 4a429a8fcf4d8c58120a585bee3d6047ed7c569c7a55a66e4b28919400b29a0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_rotate-0.1.0-py3-none-any.whl:

Publisher: release.yml on gedejong/auto-rotate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page