Toolkit for document image processing
Project description
Contextualization
In progress toolkit for document image pre processing.
Aimed for images to be OCRed.
Main Available methods
-
Auto rotate image
Uses left margin of a document to calculate the angle of rotation present, and correct it accordingly.
Can be given the rotation direction (clocwise or counter_clockwise), or in auto mode tries to determine the side to which the document is tilted (can be none, in which case image won't be rotated).
-
Calculate rotation direction
Calculates rotation direction of an image by finding the biggest sets of the first black pixels appearances (with outliers removed) in the image for each direction: clockwise, counter_clockwise and none.
For none direction, the set is created based on pixels with same 'x' coordinate that with less than a 5% height difference, relative to the image's height.
-
Binarize document
Normal binarization with otsu tresholding and fastNlMeansDenoising.
Fax binarization, following the image magick command: convert "image" -colorspace Gray ( +clone -blur 15,15 ) -compose Divide_Src -composite -level 10%,90%,0.2
-
Split document into columns
Analyzes document image pixel color frequency and split document image into columns.
-
Auto crop document
Analyzes document image pixel color frequency and cut document margins, aiming mostly to remove possible folds in the corners.
-
Identify document images Identify document images in image, using algorithm available in leptonica's repository that finds potential image masks.
-
Get document delimiters Get document delimiters, using image transformations.
-
Segment document Segments document image into header, body and footer, using delimiters. Only the body is always guaranteed to have a value.
Bash commands:
-
binarize : binarize document image.
-
rotate_document : rotate document image.
-
split_columns : split document into column images.
-
d_auto_crop : auto crop document image.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file document_image_utils-0.1.25.4.tar.gz
.
File metadata
- Download URL: document_image_utils-0.1.25.4.tar.gz
- Upload date:
- Size: 32.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d60e49da3d855b167c8af02e041f4485a75990de735f179fa3ef98fcda87832a |
|
MD5 | 979104bf1aa7f3e1bec30321c95fffaf |
|
BLAKE2b-256 | 5217865e1de0948db5861190e28737a3288e9e7707b9c64dc95e49aa03f2bf0a |
File details
Details for the file document_image_utils-0.1.25.4-py3-none-any.whl
.
File metadata
- Download URL: document_image_utils-0.1.25.4-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.10.16.3-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92f592ca8a5480985a390e009cc4f72806c9170d013a127e64e402e8d3c937e2 |
|
MD5 | 3de34b7eb35db257348c0a26dbcc7624 |
|
BLAKE2b-256 | a5e7dc5f71fd4492277471b7d017510f8cca34506289dcb5dab19d58501147db |