Automatically deskew and turn upright every page of a PDF (jdeskew + Tesseract OSD).
Project description
Auto-Rotate
Automatically deskew and turn upright every page of a PDF — CLI, Python library, and a cross-platform desktop app.
Each page is rasterized, snapped to its correct cardinal orientation
(0/90/180/270°) with Tesseract's orientation-and-script detection, then
fine-skew corrected with jdeskew
(FFT + radial projection), and reassembled into a new PDF. Optionally, a
searchable text layer is added with OCRmyPDF.
PDF page
→ rasterize @ DPI (pypdfium2)
→ cardinal orientation snap (Tesseract OSD, lossless np.rot90) [--orient, default on]
→ fine-skew correction ±15° (jdeskew, white-filled corners)
→ binarize to black-on-white (Sauvola local threshold) [--clean, default off]
→ rebuild PDF (img2pdf)
→ searchable text layer (OCRmyPDF) [--ocr, default off]
Install
Desktop app (no Python needed) — download the installer for your OS from the
latest release: Windows .msi,
macOS .dmg (Apple Silicon, macOS 14+), or Linux .deb/.rpm/Flatpak. (Until the app is
code-signed, macOS needs a right-click → Open on first launch, and Windows SmartScreen needs
"Run anyway".)
CLI / library via pipx or pip:
pipx install auto-rotate # the `auto-rotate` command
pipx install "auto-rotate[ocr]" # + the `--ocr` capability
pip install auto-rotate # as a library
From source with uv:
uv sync # core
uv sync --all-extras # core + gui + ocr + dev tools
System binaries
| Feature | Requires |
|---|---|
--orient (default) |
the tesseract binary on PATH |
--ocr |
ocrmypdf (the [ocr] extra) + tesseract |
--no-orient |
nothing — pure Python, no system binaries |
On macOS: brew install tesseract ocrmypdf.
Usage
auto-rotate input.pdf output.pdf # cardinal orientation + deskew (image-only)
auto-rotate input.pdf output.pdf --no-orient # deskew only, no Tesseract needed
auto-rotate input.pdf output.pdf --clean # + pure black-on-white (great for scans)
auto-rotate input.pdf output.pdf --ocr # + searchable text layer
auto-rotate input.pdf output.pdf --dpi 400 -v # higher render DPI, log per-page corrections
positional: input, output
--dpi N rasterization DPI (default 300)
--orient/--no-orient cardinal-orientation detection (default: on)
--angle-max DEG max fine-skew angle searched (default 15.0)
--clean binarize to pure black-on-white (flattens scan background)
--ocr add a searchable text layer (needs ocrmypdf)
-v, --verbose log the correction applied to each page
Desktop app
auto-rotate-gui # if installed via pip/pipx with the [gui] extra
Add one or more PDFs, toggle orient / clean / OCR, pick a DPI, and Run; each file is
written next to its source as <name> - upright.pdf, with per-page progress in the log.
Orientation and OCR controls are enabled only when Tesseract / OCRmyPDF are detected.
Library
from pathlib import Path
from auto_rotate import deskew_pdf
results = deskew_pdf(Path("scan.pdf"), Path("upright.pdf"), dpi=300, orient=True)
for r in results:
print(f"page {r.index + 1}: cardinal {r.cardinal}° + skew {r.skew:.2f}°")
Notes & limitations
- The output is rasterized. Without
--ocrit is image-only (no selectable text);--ocrrestores a searchable layer via OCR. jdeskewonly resolves tilt within ±--angle-max(default 15°). Gross rotation (sideways/upside-down pages) is handled by the orientation stage, so keep--orienton for arbitrary inputs.- Tesseract OSD needs a reasonable amount of text to lock onto orientation; on blank or text-sparse pages it falls back to leaving the page as-is (0°).
--cleanuses a local (Sauvola) threshold, which keeps thin faint strokes (e.g. music staff lines) connected far better than a global threshold. It still cannot recover ink that is essentially absent from the scan — very faint lines may stay broken; raising the render--dpiis the best lever there.
Development
uv run ruff check . && uv run ruff format --check .
uv run pyright
uv run pytest # unit suite (system binaries mocked)
uv run pytest -m integration # end-to-end tests against real tesseract/ocrmypdf
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auto_rotate-0.1.0.tar.gz.
File metadata
- Download URL: auto_rotate-0.1.0.tar.gz
- Upload date:
- Size: 241.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8a9e4ec76a17980c15e7589a9f478024a0c5aeaa90adf7b54dfa1e433d3c215
|
|
| MD5 |
0799216fa90a05414f3ca5c7259e3213
|
|
| BLAKE2b-256 |
38ce843ef4c9b83bcaa4420add77419e78e1c5ee9ab33968dc8895f9c58bf94b
|
Provenance
The following attestation bundles were made for auto_rotate-0.1.0.tar.gz:
Publisher:
release.yml on gedejong/auto-rotate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_rotate-0.1.0.tar.gz -
Subject digest:
d8a9e4ec76a17980c15e7589a9f478024a0c5aeaa90adf7b54dfa1e433d3c215 - Sigstore transparency entry: 1731243893
- Sigstore integration time:
-
Permalink:
gedejong/auto-rotate@b3cb26f0c367e9b2aa68061d053d02dae7d8d277 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/gedejong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3cb26f0c367e9b2aa68061d053d02dae7d8d277 -
Trigger Event:
push
-
Statement type:
File details
Details for the file auto_rotate-0.1.0-py3-none-any.whl.
File metadata
- Download URL: auto_rotate-0.1.0-py3-none-any.whl
- Upload date:
- Size: 244.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89ab5648171dec97a2c2887d5eb07e8f610d4b625ec2669b98d974cc2be1fd9f
|
|
| MD5 |
f5f2b12d43518b50bffc8771d954cd80
|
|
| BLAKE2b-256 |
4a429a8fcf4d8c58120a585bee3d6047ed7c569c7a55a66e4b28919400b29a0f
|
Provenance
The following attestation bundles were made for auto_rotate-0.1.0-py3-none-any.whl:
Publisher:
release.yml on gedejong/auto-rotate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
auto_rotate-0.1.0-py3-none-any.whl -
Subject digest:
89ab5648171dec97a2c2887d5eb07e8f610d4b625ec2669b98d974cc2be1fd9f - Sigstore transparency entry: 1731243917
- Sigstore integration time:
-
Permalink:
gedejong/auto-rotate@b3cb26f0c367e9b2aa68061d053d02dae7d8d277 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/gedejong
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b3cb26f0c367e9b2aa68061d053d02dae7d8d277 -
Trigger Event:
push
-
Statement type: