amazon-textract-idp-cdk-constructs
Project description
Context
This CDK Construct can be used as Step Function task and call Textract in Asynchonous mode for DetectText and AnalyzeDocument APIs.
Input
Expects a Manifest JSON at 'Payload'. Manifest description: https://pypi.org/project/schadem-tidp-manifest/
Example call in Python
textract_async_task = t_async.TextractGenericAsyncSfnTask(
self,
"textract-async-task",
s3_output_bucket=s3_output_bucket,
s3_temp_output_prefix=s3_temp_output_prefix,
integration_pattern=sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
lambda_log_level="DEBUG",
timeout=Duration.hours(24),
input=sfn.TaskInput.from_object({
"Token":
sfn.JsonPath.task_token,
"ExecutionId":
sfn.JsonPath.string_at('$$.Execution.Id'),
"Payload":
sfn.JsonPath.entire_payload,
}),
result_path="$.textract_result")
Output
Adds the "TextractTempOutputJsonPath" to the Step Function ResultPath. At this location the Textract output is stored as individual JSON files. Use the CDK Construct schadem-cdk-construct-sfn-textract-output-config-to-json to combine them to one single JSON file.
example with ResultPath = textract_result (like configured above):
"textract_result": {
"TextractTempOutputJsonPath": "s3://schademcdkstackpaystuban-schademcdkidpstackpaystu-bt0j5wq0zftu/textract-temp-output/c6e141e8f4e93f68321c17dcbc6bf7291d0c8cdaeb4869758604c387ce91a480"
}
Spacy Classification
Expect a Spacy textcat model at the root of the directory. Call the script <TO_INSERT) to copy a public one which classifies Paystub and W2.
aws s3 cp s3://amazon-textract-public-content/constructs/en_textcat_demo-0.0.0.tar.gz .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for amazon-textract-idp-cdk-constructs-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1723b65af4467c8f9d0d7703db2397165b414b23e1150f137db6a34b07ed5048 |
|
MD5 | 1b056d2147ee1dc4219bc5fdb5f7b714 |
|
BLAKE2b-256 | e3a27b8367e4034bcf60357bf057f1a8d5a27d3d8d550a9d48256eff7c7728b4 |
Hashes for amazon_textract_idp_cdk_constructs-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9b8c5ffa2caae6f1bbb41711b8b12330d9ca98fc943608ad8489a2999ca68eb |
|
MD5 | 695c822d7c55f3815aadbb1d9e97ccc1 |
|
BLAKE2b-256 | 4f1b970d1b29370eee3925d4a47fa69d4ff880542a6c5f481a475d22f635037d |