Easily parse JSON returned by Amazon Textract.
Project description
Textract Response Parser
You can use Textract response parser library to easily parser JSON returned by Amazon Textract. Library parses JSON and provides programming language specific constructs to work with different parts of the document. textractor is an example of PoC batch processing tool that takes advantage of Textract response parser library and generate output in multiple formats.
Installation
python -m pip install amazon-textract-response-parser
Python Usage
# Call Amazon Textract and get JSON response
# client = boto3.client('textract')
# response = client.analyze_document(Document={...}, FeatureTypes=[...])
# Parse JSON response from Textract
from trp import Document
doc = Document(response)
# Iterate over elements in the document
for page in doc.pages:
# Print lines and words
for line in page.lines:
print("Line: {}--{}".format(line.text, line.confidence))
for word in line.words:
print("Word: {}--{}".format(word.text, word.confidence))
# Print tables
for table in page.tables:
for r, row in enumerate(table.rows):
for c, cell in enumerate(row.cells):
print("Table[{}][{}] = {}-{}".format(r, c, cell.text, cell.confidence))
# Print fields
for field in page.form.fields:
print("Field: Key: {}, Value: {}".format(field.key.text, field.value.text))
# Get field by key
key = "Phone Number:"
field = page.form.getFieldByKey(key)
if(field):
print("Field: Key: {}, Value: {}".format(field.key, field.value))
# Search fields by key
key = "address"
fields = page.form.searchFieldsByKey(key)
for field in fields:
print("Field: Key: {}, Value: {}".format(field.key, field.value))
Test
- Clone the repo and run pytest
python -m pip install pytest
git clone https://github.com/aws-samples/amazon-textract-response-parser.git
cd amazon-textract-response-parser
pytest
Other Resources
- Large scale document processing with Amazon Textract - Reference Architecture
- Batch processing tool
- Code samples
License Summary
This sample code is made available under the Apache License Version 2.0. See the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for amazon-textract-response-parser-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 898f5e6c3657c31a13e8756afa34f4d226987b535f878db1b5b5f3ef0dd34789 |
|
MD5 | d8e89a2e05138c89d5fad748f338fe69 |
|
BLAKE2b-256 | be4300a9fc074a61389cb944cecd51d43163897eaa4d19cb5eb21d1107a0b2b4 |
Close
Hashes for amazon_textract_response_parser-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eaba15bcb941acae126aa1f94ba72a9f35e1b5994457d2a9a0857ace512ab8a9 |
|
MD5 | a56952adaa0f59745b8dfdd334849fa6 |
|
BLAKE2b-256 | e7f352743d9144029e25f6377654446d11f19641aecf3990d077a6b9e13014f7 |