Toolkit for document image processing
Project description
Contextualization
In progress toolkit for document image pre processing.
Aimed for images to be OCRed.
Main Available methods
-
Auto rotate image
Uses left margin of a document to calculate the angle of rotation present, and correct it accordingly.
Can be given the rotation direction (clocwise or counter_clockwise), or in auto mode tries to determine the side to which the document is tilted (can be none, in which case image won't be rotated).
-
Calculate rotation direction
Calculates rotation direction of an image by finding the biggest sets of the first black pixels appearances (with outliers removed) in the image for each direction: clockwise, counter_clockwise and none.
For none direction, the set is created based on pixels with same 'x' coordinate that with less than a 5% height difference, relative to the image's height.
-
Binarize document
-
Split document into columns
Analyzes document image pixel color frequency and split document image into columns.
-
Auto crop document
Analyzes document image pixel color frequency and cut document margins, aiming mostly to remove possible folds in the corners.
-
Identify document images Identify document images in image, using algorithm available in leptonica's repository that finds potential image masks.
-
Get document delimiters Get document delimiters, using image transformations.
-
Segment document Segments document image into header, body and footer, using delimiters. Only the body is always guaranteed to have a value.
Bash commands:\n
-
binarize : binarize document image.
-
rotate_document : rotate document image.
-
split_columns : split document into column images.
-
d_auto_crop : auto crop document image.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for document_image_utils-0.1.18.6.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd23885510726235f273cedac5be63c2d3c38f51774c74062d6115a7c4bca8d8 |
|
MD5 | 3808f405a66ac366ded165f24d064fa1 |
|
BLAKE2b-256 | 4b855b4978e70a0a53816d9d4b0818681c741a5d0bbdb488e01c4682f1b17548 |
Hashes for document_image_utils-0.1.18.6.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0dc2746bb3a651ae76c46a527971689f2ae20ff7ea660bc21d6f1871a2a99f29 |
|
MD5 | 542c97f7fc97b6df8a9882e02b81341f |
|
BLAKE2b-256 | 87059e2817cf27e60fef81b40a16a70c1d36c4a312ecb61b44a9e15ab36f1ec9 |