A JSII construct lib to build AWS Serverless infrastructure to orchestrate Redshift using AWS stepfunctions.
Project description
Welcome to the cdk-stepfunctions-redshift project!
cdk-stepfunctions-redshift
provides SfnRedshiftTasker
which is a JSII construct library to build AWS Serverless
infrastructure to implement a callback pattern for Amazon Redshift statements.
The SfnRedshiftTasker
construct will take details of a Redshift target (clustername, database name & username) and
the resulting object will have a lambdaFunction
property which will provide the interface to run statements against
that Redshift target with callback functionality.
The current solution supports implementing 2 callback patterns:
- Step functions: have a state that issues a SQL statement and have it only transition once the statement succeeds or fails (see stepfunction_redshift_integration.md for how to use this)
- Cloud formation: After running the SQL statement do a callback to signal cloudformation success or failure (see cloudformation_redhsift_integration.md for how to use this)
Behind the scenes
When you use a SfnRedshiftTasker
in your stack you will get:
- A Lambda function for invoking tasks on the Amazon Redshift cluster
- A DDB Table to track ongoing-executions
- An Event rule to monitor Amazon Redshift Data API completion events and route them to SQS
- An SQS queue to receive above mentioned Amazon Redshift Data API completion events
- A Lambda function to process API Completions events (by default same function as the one above)
- A KMS key which encrypts data at rest.
This allows to easily create step-function tasks which execute a SQL command and will only complete once Amazon Redshift finishes executing the corresponding statement.
How it works
Serverless infrastructure will be spawn up for a specific (cluster, user, database). A Lambda function will be provided which allows invoking statements as this user. States can then be used to do a seemingly synchronous invocation of a Amazon Redshift statement allowing your statemachines to have a simpler definition (see Example definition for an example state).
Example flow (for step function)
- A step-function step triggers the Lambda function provided by the construct. The step function step follows a structure for its invocation payload which includes designated fields (following the API of the invoker function)
- The Lambda function will generate a unique ID based on the execution ARN and register the SQL invocation in a DynamoDB state table.
- The lambda function then starts the statement using the Amazon Redshift data API using the Unique ID as statement name and requesting events for state changes.
- As a result of step 3 Amazon Redshift executes the statement. Once that statement completes it emits an event. Our building blocks have put in place a Cloudwatch Rule to monitor these events.
- The event gets placed into an SQS queue
- This SQS queue is monitored by a Lambda function (could be the same as the previous one).
- The Lambda function will check whether the finished query is related to a step function invocation in order to retrieve the task token of the step.
- If it is then it will do a succeed/fail callback to the step-function step (using the task token) depending on success/failure of the SQL statement.
- It will mark the invocation as processed in the state table.
How to use
This is a construct so you can use it from a CDK Stack. An example stack can be found at integ.default.ts
. That stack sets up an Amazon Redshift cluster, the SfnRedshiftTasker
infra and some state machines that use the
functionality. It can be launched by compiling the code (which creates a lib directory) and deploying the CDK app:
yarn compile && npx cdk --app ./lib/integ.default.js deploy
Considerations
When using this approach do keep in mind the considerations of the Amazon Redshift Data API.
These shouldn't be blockers:
- If query result is too big consider using
UNLOAD
rather thanSELECT
. - If the statement size is too big consider splitting up the statement in multiple statements. For example by defining and utilizing views or encapsulating the logic in a stored procedure.
Security
See CONTRIBUTING for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cdk-stepfunctions-redshift-1.123.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9f403ce7e7eb9638ad6da636d1e1d869ae6e45ff4d46e54a926efb85a4d0a7a |
|
MD5 | 2cd1d92c9ee7a0ca32599dd455c72fa0 |
|
BLAKE2b-256 | dfd46f887d47be6e8cbaa6a258a9b689aff720bfeff7d5911b1d677fcefc2bf4 |
Hashes for cdk_stepfunctions_redshift-1.123.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 439b035f29f1dcda28db1f1f7c4de71809460ed043be795d506212bab517996e |
|
MD5 | 33aae2523389bafb9409c405396f1d38 |
|
BLAKE2b-256 | 1c10226ff69f3a97ad6bccce2499cffbecff1b26d7a2c7797e9820917443d792 |