Skip to main content

Automatically deskew and turn upright every page of a PDF (jdeskew + Tesseract OSD).

Project description

Auto-Rotate

Auto-Rotate

CI PyPI Python License: BSD-3-Clause

Automatically deskew and turn upright every page of a PDF — CLI, Python library, and a cross-platform desktop app.

Each page is rasterized, snapped to its correct cardinal orientation (0/90/180/270°) with Tesseract's orientation-and-script detection, then fine-skew corrected with jdeskew (FFT + radial projection), and reassembled into a new PDF. Optionally, a searchable text layer is added with OCRmyPDF.

PDF page
  → rasterize @ DPI (pypdfium2)
  → cardinal orientation snap   (Tesseract OSD, lossless np.rot90)   [--orient, default on]
  → fine-skew correction ±15°   (jdeskew, white-filled corners)
  → binarize to black-on-white  (Sauvola local threshold)            [--clean, default off]
  → rebuild PDF (img2pdf)
  → searchable text layer       (OCRmyPDF)                           [--ocr, default off]

Install

Desktop app (no Python needed) — download the installer for your OS from the latest release: Windows .msi, macOS .dmg (Apple Silicon, macOS 14+), or Linux .deb/.rpm/Flatpak.

The desktop builds are not code-signed (this is a free open-source project without Apple/Windows signing certificates), so the OS warns on first launch:

  • macOS: right-click the app → Open (or System Settings → Privacy & Security → Open Anyway). One time only. To skip the prompt entirely, install via pipx below.
  • Windows: SmartScreen → More infoRun anyway.

CLI / library / GUI via pipx or pip — no Gatekeeper/SmartScreen prompt, since nothing is downloaded as an app bundle:

pipx install auto-rotate            # the `auto-rotate` command
pipx install "auto-rotate[ocr]"     # + the `--ocr` capability
pipx install "auto-rotate[gui]"     # + the `auto-rotate-gui` desktop app
pip install auto-rotate             # as a library

From source with uv:

uv sync                  # core
uv sync --all-extras     # core + gui + ocr + dev tools

System binaries

Feature Requires
--orient (default) the tesseract binary on PATH
--ocr ocrmypdf (the [ocr] extra) + tesseract
--no-orient nothing — pure Python, no system binaries

On macOS: brew install tesseract ocrmypdf.

Usage

auto-rotate input.pdf output.pdf                  # cardinal orientation + deskew (image-only)
auto-rotate input.pdf output.pdf --no-orient      # deskew only, no Tesseract needed
auto-rotate input.pdf output.pdf --clean          # + pure black-on-white (great for scans)
auto-rotate input.pdf output.pdf --ocr            # + searchable text layer
auto-rotate input.pdf output.pdf --dpi 400 -v     # higher render DPI, log per-page corrections
positional: input, output
--dpi N            rasterization DPI (default 300)
--orient/--no-orient   cardinal-orientation detection (default: on)
--angle-max DEG    max fine-skew angle searched (default 15.0)
--clean            binarize to pure black-on-white (flattens scan background)
--ocr              add a searchable text layer (needs ocrmypdf)
-v, --verbose      log the correction applied to each page

Desktop app

auto-rotate-gui     # if installed via pip/pipx with the [gui] extra

Add one or more PDFs, toggle orient / clean / OCR, pick a DPI, and Run; each file is written next to its source as <name> - upright.pdf, with per-page progress in the log. Orientation and OCR controls are enabled only when Tesseract / OCRmyPDF are detected.

Library

from pathlib import Path
from auto_rotate import deskew_pdf

results = deskew_pdf(Path("scan.pdf"), Path("upright.pdf"), dpi=300, orient=True)
for r in results:
    print(f"page {r.index + 1}: cardinal {r.cardinal}° + skew {r.skew:.2f}°")

Notes & limitations

  • The output is rasterized. Without --ocr it is image-only (no selectable text); --ocr restores a searchable layer via OCR.
  • jdeskew only resolves tilt within ±--angle-max (default 15°). Gross rotation (sideways/upside-down pages) is handled by the orientation stage, so keep --orient on for arbitrary inputs.
  • Tesseract OSD needs a reasonable amount of text to lock onto orientation; on blank or text-sparse pages it falls back to leaving the page as-is (0°).
  • --clean uses a local (Sauvola) threshold, which keeps thin faint strokes (e.g. music staff lines) connected far better than a global threshold. It still cannot recover ink that is essentially absent from the scan — very faint lines may stay broken; raising the render --dpi is the best lever there.

Development

uv run ruff check . && uv run ruff format --check .
uv run pyright
uv run pytest                  # unit suite (system binaries mocked)
uv run pytest -m integration   # end-to-end tests against real tesseract/ocrmypdf

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto_rotate-0.1.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auto_rotate-0.1.1-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file auto_rotate-0.1.1.tar.gz.

File metadata

  • Download URL: auto_rotate-0.1.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auto_rotate-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a1b642594e217fe086abe97931c44b378fce574eaa0ea85fda6934d65faaf3fa
MD5 51017769a53758a296a88edbd16e78b5
BLAKE2b-256 182d586ff326c2b1e7139aa1fd904a5e7db848e205b1f3c023dd4c00f6244ed9

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_rotate-0.1.1.tar.gz:

Publisher: release.yml on gedejong/auto-rotate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file auto_rotate-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: auto_rotate-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for auto_rotate-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2feeceb5b43adb273cb941cc8c2a2cb1ac8e2c9d570f433bb0e0cce17393d073
MD5 2ad4f829afd3ae8ea9b61f1eddbcec49
BLAKE2b-256 11c765562c13bf3e5fb630cbbbbe84af40947869c0ff1dd8cbff743d14c0f641

See more details on using hashes here.

Provenance

The following attestation bundles were made for auto_rotate-0.1.1-py3-none-any.whl:

Publisher: release.yml on gedejong/auto-rotate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page