Skip to main content

amazon-textract-idp-cdk-constructs

Project description

Context

This CDK Construct can be used as Step Function task and call Textract in Asynchonous mode for DetectText and AnalyzeDocument APIs.

Input

Expects a Manifest JSON at 'Payload'. Manifest description: https://pypi.org/project/schadem-tidp-manifest/

Example call in Python

        textract_async_task = t_async.TextractGenericAsyncSfnTask(
            self,
            "textract-async-task",
            s3_output_bucket=s3_output_bucket,
            s3_temp_output_prefix=s3_temp_output_prefix,
            integration_pattern=sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
            lambda_log_level="DEBUG",
            timeout=Duration.hours(24),
            input=sfn.TaskInput.from_object({
                "Token":
                sfn.JsonPath.task_token,
                "ExecutionId":
                sfn.JsonPath.string_at('$$.Execution.Id'),
                "Payload":
                sfn.JsonPath.entire_payload,
            }),
            result_path="$.textract_result")

Output

Adds the "TextractTempOutputJsonPath" to the Step Function ResultPath. At this location the Textract output is stored as individual JSON files. Use the CDK Construct schadem-cdk-construct-sfn-textract-output-config-to-json to combine them to one single JSON file.

example with ResultPath = textract_result (like configured above):

"textract_result": {
    "TextractTempOutputJsonPath": "s3://schademcdkstackpaystuban-schademcdkidpstackpaystu-bt0j5wq0zftu/textract-temp-output/c6e141e8f4e93f68321c17dcbc6bf7291d0c8cdaeb4869758604c387ce91a480"
  }

Spacy Classification

Expect a Spacy textcat model at the root of the directory. Call the script <TO_INSERT) to copy a public one which classifies Paystub and W2.

aws s3 cp s3://amazon-textract-public-content/constructs/en_textcat_demo-0.0.0.tar.gz .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file amazon-textract-idp-cdk-constructs-0.0.3.tar.gz.

File metadata

File hashes

Hashes for amazon-textract-idp-cdk-constructs-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1723b65af4467c8f9d0d7703db2397165b414b23e1150f137db6a34b07ed5048
MD5 1b056d2147ee1dc4219bc5fdb5f7b714
BLAKE2b-256 e3a27b8367e4034bcf60357bf057f1a8d5a27d3d8d550a9d48256eff7c7728b4

See more details on using hashes here.

File details

Details for the file amazon_textract_idp_cdk_constructs-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_textract_idp_cdk_constructs-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a9b8c5ffa2caae6f1bbb41711b8b12330d9ca98fc943608ad8489a2999ca68eb
MD5 695c822d7c55f3815aadbb1d9e97ccc1
BLAKE2b-256 4f1b970d1b29370eee3925d4a47fa69d4ff880542a6c5f481a475d22f635037d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page