Skip to main content

amazon-textract-idp-cdk-constructs

Project description

Amazon Textract IDP CDK Constructs

---

Stability: Experimental

All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.


Context

This CDK Construct can be used as Step Function task and call Textract in Asynchonous mode for DetectText and AnalyzeDocument APIs.

For samples on usage, look at Amazon Textact IDP CDK Stack Samples

Input

Expects a Manifest JSON at 'Payload'. Manifest description: https://pypi.org/project/schadem-tidp-manifest/

Example call in Python

        textract_async_task = t_async.TextractGenericAsyncSfnTask(
            self,
            "textract-async-task",
            s3_output_bucket=s3_output_bucket,
            s3_temp_output_prefix=s3_temp_output_prefix,
            integration_pattern=sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
            lambda_log_level="DEBUG",
            timeout=Duration.hours(24),
            input=sfn.TaskInput.from_object({
                "Token":
                sfn.JsonPath.task_token,
                "ExecutionId":
                sfn.JsonPath.string_at('$$.Execution.Id'),
                "Payload":
                sfn.JsonPath.entire_payload,
            }),
            result_path="$.textract_result")

Output

Adds the "TextractTempOutputJsonPath" to the Step Function ResultPath. At this location the Textract output is stored as individual JSON files. Use the CDK Construct schadem-cdk-construct-sfn-textract-output-config-to-json to combine them to one single JSON file.

example with ResultPath = textract_result (like configured above):

"textract_result": {
    "TextractTempOutputJsonPath": "s3://schademcdkstackpaystuban-schademcdkidpstackpaystu-bt0j5wq0zftu/textract-temp-output/c6e141e8f4e93f68321c17dcbc6bf7291d0c8cdaeb4869758604c387ce91a480"
  }

Spacy Classification

Expect a Spacy textcat model at the root of the directory. Call the script <TO_INSERT) to copy a public one which classifies Paystub and W2.

aws s3 cp s3://amazon-textract-public-content/constructs/en_textcat_demo-0.0.0.tar.gz .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file amazon-textract-idp-cdk-constructs-0.0.11.tar.gz.

File metadata

File hashes

Hashes for amazon-textract-idp-cdk-constructs-0.0.11.tar.gz
Algorithm Hash digest
SHA256 daf9a1a716476fdb96ff3d6b02b4b18a564abe078bb239abb90f9ac88a8270bc
MD5 e4d021d1b9f4dd514ee237e2193d76c8
BLAKE2b-256 924bf0bc2ba2d339df6aac527c1f1b6fbc3b26607a3cd6bfccab5eaefa09d5f7

See more details on using hashes here.

File details

Details for the file amazon_textract_idp_cdk_constructs-0.0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for amazon_textract_idp_cdk_constructs-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 b10f943d6c4600743cd0e933f688da9653dd15c233f31aca9e0410809f6e6be6
MD5 b78b2c890fab12c79bc087a1a2ce07e1
BLAKE2b-256 6c3c82c363f34926f56f3a4c49c6b05505e7d93abfa0adda548e243cd1c75488

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page