Amazon Textract Helper tools
Project description
Textractor-Textract-Helper
amazon-textractor-textract-helper provides a collection of ready to use functions and sample implementations to speed up the evaluation and development for any project using Amazon Textract.
Install
> python -m pip install amazon-textract-helper
Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
Samples
Easy Start
> amazon-textract --example
this will run the examples document using the DetectDocumentText API. Output will be printed to stdout and look similar to this:
{"DocumentMetadata": {"Pages": 1}, "Blocks": [{"BlockType": "PAGE", "Geometry": {"BoundingBox": {"Width": 1.0, "Height": 1.0, "Left": 0.0
, "Top": 0.0}, "Polygon": [{"X": 9.33321120033382e-17, "Y": 0.0}, {"X": 1.0, "Y": 1.6069064689339292e-16}, {"X": 1.0, "Y": 1.0}],
"HTTPHeaders": {"x-amzn-requestid": "12345678-1234-1234-1234-123456789012", "content-type": "a
pplication/x-amz-json-1.1", "content-length": "48177", "date": "Thu, 01 Apr 2021 21:50:29 GMT"}, "RetryAttempts": 0}}
It is working.
Call with document on S3
> amazon-textract --input-document "s3://somebucket/someprefix/someobjectname.png"
Output similar to Easy Start
Call with document on local file system
> amazon-textract --input-document "./somepath/somefilename.png"
Output similar to Easy Start
We will continue to use the --example
parameter to keep it simple and easy to reproduce. S3 and local files work the same way, just instead of --example use --input-document .
Call with FORMS and TABLES
> amazon-textract --example --features FORMS TABLES
This will call the [AnalyzeDocument API] (https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html) and output will include Output will look similar to "Easy Start" but include
Pretty print the output
Pretty print outputs nicely formatted information for words, lines, forms or tables.
For example to print the tables identified by Amazon Textract to stdout, use
> amazon-textract --example --features TABLES --pretty-print TABLES
Output will look like this:
|------------|-----------|---------------------|-----------------|-----------------------|
| | | Previous Employment | History | |
| Start Date | End Date | Employer Name | Position Held | Reason for leaving |
| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated |
| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity |
| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer |
to pretty print both, FORMS and TABLES:
> amazon-textract --example --features FORMS TABLES --pretty-print FORMS TABLES
will output
Phone Number:: 555-0100
Home Address:: 123 Any Street, Any Town, USA
Full Name:: Jane Doe
Mailing Address:: same as home address
|------------|-----------|---------------------|-----------------|-----------------------|
| | | Previous Employment | History | |
| Start Date | End Date | Employer Name | Position Held | Reason for leaving |
| 1/15/2009 | 6/30/2011 | Any Company | Assistant Baker | Family relocated |
| 7/1/2011 | 8/10/2013 | Best Corp. | Baker | Better opportunity |
| 8/15/2013 | present | Example Corp. | Head Baker | N/A, current employer |
Overlay
At the moment overlay only works with images, we will add support for PDF soon.
The following command runs DetectDocumentText, pretty prints the WORDS in the document to stdout and draws bounding boxes around each WORD and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'.
amazon-textract --example --pretty-print WORDS --overlay WORD --pop-up-overlay-output --overlay-output-folder overlay-output-folder-name
The following command runs AnalyzeDocument for FORMS and TABLES, pretty prints FORMS and TABLES to to stdout and draws bounding boxes around each TABLE-CELL and FORM KEY/VALUE and displays the result in a popup window and stores it to a folder called 'overlay-output-folder-name'.
> amazon-textract --example --features TABLES FORMS --pretty-print FORMS TABLES --overlay FORM CELL --pop-up-overlay-output --overlay-output-folder ../docs/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for amazon-textract-helper-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28f843d03b73f54254e50be15ccc70579cd1f31a7ca6b69212d07fe63c1d41e3 |
|
MD5 | 5cb1515b68208e7540a9f477193fb193 |
|
BLAKE2b-256 | 55d7fc077137a90ed6c4d59bd49db0a485d74fea7b8c3e76d92a1143d8522869 |
Hashes for amazon_textract_helper-0.0.4-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 473bcaec36ced92f5fed49ef11ead32cf08f56a9f3dfa3170481befd35a7d44a |
|
MD5 | 9eda97dfc66ce7fc46372535929d74e3 |
|
BLAKE2b-256 | 9d9fb6183846befaac1af8578cd80d744b36cbb782496fbb1e6408fe4d254bfd |