Skip to main content

amazon-textract-idp-cdk-constructs

Project description

Amazon Textract IDP CDK Constructs

---

Stability: Experimental

All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


Context

This CDK Construct can be used as Step Function task and call Textract in Asynchonous mode for DetectText and AnalyzeDocument APIs.

For samples on usage, look at Amazon Textact IDP CDK Stack Samples

Input

Expects a Manifest JSON at 'Payload'. Manifest description: https://pypi.org/project/schadem-tidp-manifest/

Example call in Python

        textract_async_task = t_async.TextractGenericAsyncSfnTask(
            self,
            "textract-async-task",
            s3_output_bucket=s3_output_bucket,
            s3_temp_output_prefix=s3_temp_output_prefix,
            integration_pattern=sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
            lambda_log_level="DEBUG",
            timeout=Duration.hours(24),
            input=sfn.TaskInput.from_object({
                "Token":
                sfn.JsonPath.task_token,
                "ExecutionId":
                sfn.JsonPath.string_at('$$.Execution.Id'),
                "Payload":
                sfn.JsonPath.entire_payload,
            }),
            result_path="$.textract_result")

Output

Adds the "TextractTempOutputJsonPath" to the Step Function ResultPath. At this location the Textract output is stored as individual JSON files. Use the CDK Construct schadem-cdk-construct-sfn-textract-output-config-to-json to combine them to one single JSON file.

example with ResultPath = textract_result (like configured above):

"textract_result": {
    "TextractTempOutputJsonPath": "s3://schademcdkstackpaystuban-schademcdkidpstackpaystu-bt0j5wq0zftu/textract-temp-output/c6e141e8f4e93f68321c17dcbc6bf7291d0c8cdaeb4869758604c387ce91a480"
  }

Spacy Classification

Expect a Spacy textcat model at the root of the directory. Call the script <TO_INSERT) to copy a public one which classifies Paystub and W2.

aws s3 cp s3://amazon-textract-public-content/constructs/en_textcat_demo-0.0.0.tar.gz .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file amazon-textract-idp-cdk-constructs-0.0.8.tar.gz.

File metadata

File hashes

Hashes for amazon-textract-idp-cdk-constructs-0.0.8.tar.gz
Algorithm Hash digest
SHA256 c14121552d5c91c7ab65a00550dc9ff8f57d4e6e98ce1836dd1122e30fdb98c0
MD5 50947a8b9b70a53a567b2ac89da84d30
BLAKE2b-256 39f3d4f702ac711deb309800f2f040e57ade042b374750e8d8439c4bcf8324f0

See more details on using hashes here.

File details

Details for the file amazon_textract_idp_cdk_constructs-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_textract_idp_cdk_constructs-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 72e94122cd8629242e9c260a61416a60a1e5dd7aedb135d1514ace4e0f21f503
MD5 1865c160d8d2e1893763e4370f4188f9
BLAKE2b-256 745bcd4ec078aa527b820d4e7038048584ca34819c47ab0276e37e7e91394fed

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page