Amazon Textract Overlay tools
Project description
Textract-Overlayer
amazon-textract-overlayer provides functions to help overlay bounding boxes on documents.
Install
> python -m pip install amazon-textract-overlayer
Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
Samples
Primary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.
Mostly taken from the amazon-textract
command from the package amazon-textract-helper
.
This will return the bounding boxes for WORD and CELL data types.
from textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes
from textractcaller.t_call import Textract_Features, Textract_Types, call_textract
doc = call_textract(input_document=input_document, features=features)
# image is a PIL.Image.Image in this case
document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])
overlay=[Textract_Types.WORD, Textract_Types.CELL]
bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)
The actual overlay drawing of bounding boxes for images is in the amazon-textract
command from the package amazon-textract-helper
and looks like this:
from PIL import Image, ImageDraw
image = Image.open(input_document)
rgb_im = image.convert('RGB')
draw = ImageDraw.Draw(rgb_im)
# check the impl in amazon-textract-helper for ways to associate different colors to types
for bbox in bounding_box_list:
draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)
rgb_im.show()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for amazon-textract-overlayer-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d88ee40c6b99e7228a44ed9e983a1e393d1a79fad845072deefe53a323eb2d1f |
|
MD5 | 88fe2de17508dd4fa72584f75d18a1b0 |
|
BLAKE2b-256 | efc2b88f12b725373652730d6a445cd310dacd329e0c974710b0ebfd7ff4bfa9 |
Hashes for amazon_textract_overlayer-0.0.3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c16b8aa075b3849253da2099c63296f2f2c31d60434d974e86d43ea0fb263f8 |
|
MD5 | 53b13fa0e28d3e004b2fb88d82e91962 |
|
BLAKE2b-256 | aad999b6c56d4ff79c991c0914dbd8e77b05c15f15876a82ed216de12eeb6dc8 |