Skip to main content

Skew detection and correction in scanned images.

Project description

hough - Skew detection in scanned images

Actions Status PyPI PyPI - Python Version GPL v2.0 License Code style: black Coverage stats

Hough finds skew angles in scanned document pages, using the Hough transform.

It is oriented to batch processing, and can make use of multiple cores or an optional CUDA backend. (It can be very compute intensive!)

Installation and usage

Installation

pipx install hough

Or, if you have a supported GPU and have installed CUDA (currently 12.x supported):

pipx install "hough[cuda]"

If you don't use pipx, other methods such as pip should work fine, just create a virtual environment first.

Usage

To get started right away, here's some examples.

Generate angles (in CSV form) for a bunch of TIFF images:

hough analyse in/*.tif

The same, but for a PDF file, and display a histogram at the end:

hough analyse --histogram Able_Attach_Sep83.pdf

The same, but show more information while running:

hough --verbose --histogram Able_Attach_Sep83.pdf

The deskewing results are placed in a results.csv file created under the out/<timestamp> directory, which is created at invocation time. Here's an example:

"Input File","Page Number","Computed angle","Variance of computed angles","Image width (px)","Image height (px)"
"/home/toby/my-pages/orig/a--0000.pgm.tif",,-0.07699791151672428,0.001073874144832815,5014,6659
"/home/toby/my-pages/orig/a--0001.pgm.tif",,,,5018,6630
"/home/toby/my-pages/orig/a--0002.pgm.tif",,0.24936351676615068,0.005137031681286154,5021,6629
"/home/toby/my-pages/orig/a--0003.pgm.tif",,,,5020,6608
"/home/toby/my-pages/orig/a--0004.pgm.tif",,-0.037485115754500545,0.025945115897015238,5021,6616

The program should work on various image input formats, and with both grey scale and RGB images. Hough works best with images ≥300dpi.

Here's a histogram sample:

=== Skew statistics ===
0.00° - 0.10°  [57]  ████████████████████████████████████████
0.10° - 0.20°  [39]  ███████████████████████████▍
0.20° - 0.30°  [30]  █████████████████████
0.30° - 0.40°  [30]  █████████████████████
0.40° - 0.50°  [11]  ███████▊
0.50° - 0.60°  [11]  ███████▊
0.60° - 0.70°  [ 3]  ██▏
0.70° - 0.80°  [ 4]  ██▊
0.80° - 0.90°  [ 0]
0.90° - 1.00°  [ 1]  ▊
1.00° - 1.10°  [ 1]  ▊
1.10° - 1.20°  [ 0]
1.20° - 1.30°  [ 1]  ▊
1.30° - 1.40°  [ 1]  ▊
1.40° - 1.50°  [ 1]  ▊
1.50° - 1.60°  [ 2]  █▍
1.60° - 1.70°  [ 0]
1.70° - 1.80°  [ 1]  ▊
1.80° - 1.90°  [ 2]  █▍
1.90° - 2.00°  [ 0]
Samples: 195
50th percentile: 0.20°
90th percentile: 0.55°
99th percentile: 1.77°

Command line options

You can list them by running hough --help:

Usage: hough COMMAND

╭─ Commands ─────────────────────────────────────────────────────────────────────────────╮
│ analyse    Analyse one or more files for deskewing.                                    │
│ histogram  Show a histogram of rotation angles from a previous analysis.               │
│ process    Fully analyse and rotate one or more files.                                 │
│ rotate     Rotate one or more files that have previously been analysed.                │
│ --help -h  Display this message and exit.                                              │
│ --version  Display application version.                                                │
╰────────────────────────────────────────────────────────────────────────────────────────╯

Or ask for help for a specific command, e.g. hough analyse --help:

Usage: hough analyse [ARGS] [OPTIONS]

Analyse one or more files for deskewing.

╭─ Parameters ───────────────────────────────────────────────────────────────────────────╮
│ *  FILES --files          One or more files to analyse for deskewing. [required]       │
│    DEBUG --debug          Save intermediate results in debug/ under out folder.        │
│                           [default: False]                                             │
│    VERBOSE --verbose      Print status messages instead of progress bar. [default:     │
│                           False]                                                       │
│    OUT --out              Use the specified path for results and post-rotated files.   │
│                           [default: out/TIMESTAMP]                                     │
│    WORKERS --workers      Number of workers to run simultaneously. [default: 4]        │
│    HISTOGRAM --histogram  Display result summary as histogram after processing.        │
│                           [default: False]                                             │
╰────────────────────────────────────────────────────────────────────────────────────────╯

Examples

Just about all of these files have been deskewed this way.

Getting the best results

NOTE: This is a beta product!

There's a few guidelines you should follow to get the best deskewing results from your document scans:

  1. Bilevel (black-and-white) bitmaps will produce lower quality results. For best results, scan to greyscale or RGB first, deskew with Hough, then reduce the colour depth to bilevel if desired.
  2. Hough deskewing is an inexact process, with many heuristics discovered by trial and error. Hough may not work well on your material without tuning and further modification. (We'd love your pull requests!)

Debugging output

You can spy on Hough's attempts to perform deskewing by passing the --debug flag on the command line. The generated images, and any detected lines in them, are placed in the out/<timestamp>/debug/ directory.

Note that Hough cannot always determine a skew for a page (e.g. blank pages in particular), and will very occasionally get the skew wrong (depending on source material). It's worth reviewing these images if Hough makes a bad decision on your scans. Please submit these files along with the original image when filing an issue!

Recommended scanners

The authors have tested this software with output from the following scanners:

  • Fujitsu fi-4530C, USB
    • Fast
    • Cheap on eBay
    • Requires a Windows XP VirtualBox for drivers
  • Brother ADS-2700W, USB + Ethernet + WiFi
    • Fast
    • Can scan directly to the network or to a memory stick
    • Factory reconditioned models stilll available (March 2020)
    • Very low skew out of the box
  • Epson WF-7610, USB + Ethernet + WiFi
    • 11"x17" and duplex capable
    • Can scan directly to the network or to a memory stick

Developing

First, clone this repo.

You'll need to install Poetry, then run:

poetry sync --with dev   # or --with dev,cuda if you have CUDA installed
poetry self add 'poethepoet[poetry_plugin]' poetry-plugin-shell

Do some work, then run the pre-commit checks and tests with:

poetry run pre-commit
poetry poe test

License notice

This file is part of "hough", which detects skew angles in scanned images
Copyright (C) 2016-2020 Toby Thain <toby@telegraphics.com.au>
Copyright (C) 2020-2025 Joan Touzet <wohali@apache.org>

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hough-0.3.1.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hough-0.3.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file hough-0.3.1.tar.gz.

File metadata

  • Download URL: hough-0.3.1.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hough-0.3.1.tar.gz
Algorithm Hash digest
SHA256 2d785ff3e4669401569d7fb9538c00e3198029fe5383e1e0c369266a6b55b2d6
MD5 f23c124c84c40757e5ff5fbaef7a743f
BLAKE2b-256 f77e1526237176c2f8086253892239a723c8cfef621c39b72b0cb2b1e578ba09

See more details on using hashes here.

Provenance

The following attestation bundles were made for hough-0.3.1.tar.gz:

Publisher: publish.yml on wohali/hough

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hough-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: hough-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for hough-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b937b96ad0eab507e0a49b41a4f8d0cfea6c9d0b935c9cdd7e308aa52640064
MD5 82b3e11d8bc374fac9a6717a944f2f28
BLAKE2b-256 b2692dd289cfa611bbeba3413088c568f3ae6c786e4312fffc3db4379ddc2210

See more details on using hashes here.

Provenance

The following attestation bundles were made for hough-0.3.1-py3-none-any.whl:

Publisher: publish.yml on wohali/hough

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page