Amazon Textract Pipeline Component to add page dimensions to page block types
Project description
Textract-PrettyPrinter
Provides functions to format the output received from Textract in more easily consumable formats incl. CSV or Markdown. amazon-textract-prettyprinter
Install
> python -m pip install amazon-textract-prettyprinter
Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
Samples
Get FORMS and TABLES as CSV
from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Pretty_Print_Table_Format, Textract_Pretty_Print, get_string
textract_json = call_textract(input_document=input_document, features=[Textract_Features.FORMS, Textract_Features.TABLES])
print(get_string(textract_json=textract_json, table_format=Pretty_Print_Table_Format.csv))
Get string for TABLES using the get_string method
from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string
textract_json = call_textract(input_document=input_document, features=[Textract_Features.TABLES])
get_string(textract_json=textract_json, output_type=Textract_Pretty_Print.TABLES)
Print out tables in LaTeX format
from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string
textract_json = call_textract(input_document=input_document, features=[Textract_Features.FORMS, Textract_Features.TABLES])
get_tables_string(textract_json=textract_json, table_format=Pretty_Print_Table_Format.latex)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file amazon-textract-pipeline-pagedimensions-0.0.1.tar.gz
.
File metadata
- Download URL: amazon-textract-pipeline-pagedimensions-0.0.1.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d1b21d253fd8f58914fde84d13d59e5fdc818321caf225f4d8744d96805bfedb |
|
MD5 | 2c950e362b9ef7e4dce5d401153f8921 |
|
BLAKE2b-256 | 153bb026264f698e6dcb36f065df1a84b51446da37e952cc56a052d2c119af6a |
File details
Details for the file amazon_textract_pipeline_pagedimensions-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: amazon_textract_pipeline_pagedimensions-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | beb029cf7f5f1f644be1d661dccba5351fd772dbf1464710d77e120a55180fd7 |
|
MD5 | b70308f86bb5f110a37d7417147c1148 |
|
BLAKE2b-256 | 645823f98c2ff0db63c3d37da30155f05214501a0e40c07f7cac77a547a270a7 |