amazon-textract-idp-cdk-constructs
Project description
Amazon Textract IDP CDK Constructs
---All classes are under active development and subject to non-backward compatible changes or removal in any future version. These are not subject to the Semantic Versioning model. This means that while you may use them, you may need to update your source code when upgrading to a newer version of this package.
Context
This CDK Construct can be used as Step Function task and call Textract in Asynchonous mode for DetectText and AnalyzeDocument APIs.
For samples on usage, look at Amazon Textact IDP CDK Stack Samples
Input
Expects a Manifest JSON at 'Payload'. Manifest description: https://pypi.org/project/schadem-tidp-manifest/
Example call in Python
textract_async_task = t_async.TextractGenericAsyncSfnTask(
self,
"textract-async-task",
s3_output_bucket=s3_output_bucket,
s3_temp_output_prefix=s3_temp_output_prefix,
integration_pattern=sfn.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
lambda_log_level="DEBUG",
timeout=Duration.hours(24),
input=sfn.TaskInput.from_object({
"Token":
sfn.JsonPath.task_token,
"ExecutionId":
sfn.JsonPath.string_at('$$.Execution.Id'),
"Payload":
sfn.JsonPath.entire_payload,
}),
result_path="$.textract_result")
Output
Adds the "TextractTempOutputJsonPath" to the Step Function ResultPath. At this location the Textract output is stored as individual JSON files. Use the CDK Construct schadem-cdk-construct-sfn-textract-output-config-to-json to combine them to one single JSON file.
example with ResultPath = textract_result (like configured above):
"textract_result": {
"TextractTempOutputJsonPath": "s3://schademcdkstackpaystuban-schademcdkidpstackpaystu-bt0j5wq0zftu/textract-temp-output/c6e141e8f4e93f68321c17dcbc6bf7291d0c8cdaeb4869758604c387ce91a480"
}
Spacy Classification
Expect a Spacy textcat model at the root of the directory. Call the script <TO_INSERT) to copy a public one which classifies Paystub and W2.
aws s3 cp s3://amazon-textract-public-content/constructs/en_textcat_demo-0.0.0.tar.gz .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file amazon-textract-idp-cdk-constructs-0.0.14.tar.gz
.
File metadata
- Download URL: amazon-textract-idp-cdk-constructs-0.0.14.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6dd18c3e5f494e68186f5e5edb4646c29e8403ff23ddedda424ff601ee785aeb |
|
MD5 | 7053c40fdcc5980fddb767b57e6b44a0 |
|
BLAKE2b-256 | 0c6d7c525dab1676819180a45ed5ffa67914dd86a1e9bfc3df8d665e81235e4e |
File details
Details for the file amazon_textract_idp_cdk_constructs-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: amazon_textract_idp_cdk_constructs-0.0.14-py3-none-any.whl
- Upload date:
- Size: 3.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d53ef2ab4abf018fa55d6287c5a45ba85e8460849539818eff8d25deb659ab05 |
|
MD5 | cdb3c27576a0a79cd15f8f20072868ce |
|
BLAKE2b-256 | d02a7d4a0b76d3b19811cdea08de0642a42d867409fd01e7cdb678ff10fef3c8 |