Skip to main content

Amazon Textract Pipeline Component to add page dimensions to page block types

Project description

Textract-PrettyPrinter

Provides functions to format the output received from Textract in more easily consumable formats incl. CSV or Markdown. amazon-textract-prettyprinter

Install

> python -m pip install amazon-textract-prettyprinter

Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)

Samples

Get FORMS and TABLES as CSV

from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Pretty_Print_Table_Format, Textract_Pretty_Print, get_string

textract_json = call_textract(input_document=input_document, features=[Textract_Features.FORMS, Textract_Features.TABLES])
print(get_string(textract_json=textract_json, table_format=Pretty_Print_Table_Format.csv))

Get string for TABLES using the get_string method

from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string

textract_json = call_textract(input_document=input_document, features=[Textract_Features.TABLES])
get_string(textract_json=textract_json, output_type=Textract_Pretty_Print.TABLES)

Print out tables in LaTeX format

from textractcaller.t_call import call_textract, Textract_Features
from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string

textract_json = call_textract(input_document=input_document, features=[Textract_Features.FORMS, Textract_Features.TABLES])
get_tables_string(textract_json=textract_json, table_format=Pretty_Print_Table_Format.latex)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file amazon-textract-pipeline-pagedimensions-0.0.1.tar.gz.

File metadata

  • Download URL: amazon-textract-pipeline-pagedimensions-0.0.1.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for amazon-textract-pipeline-pagedimensions-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d1b21d253fd8f58914fde84d13d59e5fdc818321caf225f4d8744d96805bfedb
MD5 2c950e362b9ef7e4dce5d401153f8921
BLAKE2b-256 153bb026264f698e6dcb36f065df1a84b51446da37e952cc56a052d2c119af6a

See more details on using hashes here.

File details

Details for the file amazon_textract_pipeline_pagedimensions-0.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_textract_pipeline_pagedimensions-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 beb029cf7f5f1f644be1d661dccba5351fd772dbf1464710d77e120a55180fd7
MD5 b70308f86bb5f110a37d7417147c1148
BLAKE2b-256 645823f98c2ff0db63c3d37da30155f05214501a0e40c07f7cac77a547a270a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page