Skip to main content

Amazon Textract Caller tools

Project description

Textract-Caller

amazon-textract-caller provides a collection of ready to use functions and sample implementations to speed up the evaluation and development for any project using Amazon Textract.

Making it easy to call Amazon Textract regardless of file type and location.

Install

> python -m pip install amazon-textract-caller

Functions

from textractcaller.t_call import call_textract
def call_textract(input_document: Union[str, bytearray],
                  features: List[Textract_Features] = None,
                  output_config: OutputConfig = None,
                  kms_key_id: str = None,
                  job_tag: str = None,
                  notification_channel: NotificationChannel = None,
                  client_request_token: str = None,
                  return_job_id: bool = False,
                  force_async_api: bool = False) -> dict:

Also useful when receiving the JSON response from an asynchronous job (start_document_text_detection or start_document_analysis)

from textractcaller.t_call import get_full_json
def get_full_json(job_id: str = None,
                  textract_api: Textract_API = Textract_API.DETECT,
                  boto3_textract_client=None)->dict:

And when receiving the JSON from the OutputConfig location, this method is useful as well.

from textractcaller.t_call import get_full_json_from_output_config
def get_full_json_from_output_config(output_config: OutputConfig = None,
                                     job_id: str = None,
                                     s3_client = None)->dict:

Samples

Calling with file from local filesystem only with detect_text

textract_json = call_textract(input_document="/folder/local-filesystem-file.png")

Calling with file from local filesystem only detect_text and using in Textract Response Parser

(needs trp dependency through python -m pip install amazon-textract-response-parser)

import json
from trp import Document
from textractcaller.t_call import call_textract

textract_json = call_textract(input_document="/folder/local-filesystem-file.png")
d = Document(textract_json)

Calling with file from local filesystem with TABLES features

from textractcaller.t_call import call_textract, Textract_Features
features = [Textract_Features.TABLES]
response = call_textract(
    input_document="/folder/local-filesystem-file.png", features=features)

Call with images located on S3 but force asynchronous API

from textractcaller.t_call import call_textract
response = call_textract(input_document="s3://some-bucket/w2-example.png", force_async_api=True)

Call with OutputConfig, Customer-Managed-Key

from textractcaller.t_call import call_textract
from textractcaller.t_call import OutputConfig, Textract_Features
output_config = OutputConfig(s3_bucket="somebucket-encrypted", s3_prefix="output/")
response = call_textract(input_document="s3://someprefix/somefile.png",
                          force_async_api=True,
                          output_config=output_config,
                          kms_key_id="arn:aws:kms:us-east-1:12345678901:key/some-key-id-ref-erence",
                          return_job_id=False,
                          job_tag="sometag",
                          client_request_token="sometoken")

Call with PDF located on S3 and force return of JobId instead of JSON response

from textractcaller.t_call import call_textract
response = call_textract(input_document="s3://some-bucket/some-document.pdf", return_job_id=True)
job_id = response['JobId']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amazon-textract-caller-0.0.13.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

amazon_textract_caller-0.0.13-py2.py3-none-any.whl (10.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file amazon-textract-caller-0.0.13.tar.gz.

File metadata

  • Download URL: amazon-textract-caller-0.0.13.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for amazon-textract-caller-0.0.13.tar.gz
Algorithm Hash digest
SHA256 88f9b5d751dba3578f968d81bd7fb4ec9e4babf175967fdc4278495c99ab5ced
MD5 bd388bb665705cb7b91ff156d4ab8bd1
BLAKE2b-256 3bd68581c1cd6a5fff7a51a87b4daeb826324bb5647840f06f5501e2fffe7826

See more details on using hashes here.

File details

Details for the file amazon_textract_caller-0.0.13-py2.py3-none-any.whl.

File metadata

  • Download URL: amazon_textract_caller-0.0.13-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.5

File hashes

Hashes for amazon_textract_caller-0.0.13-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 19f1e1b954115d38a7819a86a1aafbc5287e2a6b89be6a06a3dcba18e388b951
MD5 83220c648b176dd4e393afdbe96dccf4
BLAKE2b-256 f1b5ed8e6fde74c75f891cfea8ed6461ec69a5ce58fa261e15b6a32427906810

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page