Construct library for Amazon Connect Data Lake

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cdklabs-automation

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved
Operating System
- OS Independent
Programming Language
Typing
- Typed

Project description

Amazon Connect Data Lake CDK Construct

An AWS Cloud Development Kit (CDK) construct that enables access to Amazon Connect analytics data lake. This solution automates the complete Connect Data Lake setup process, eliminating the need for manual configuration or custom CloudFormation templates.

The construct uses a Lambda-backed custom resource to manage the deployment process. It handles associating Connect datasets, accepting RAM resource shares, granting Lake Formation permissions, and creating resource link tables in a centralized Glue database—with support for same-account and cross-account configurations.

Usage

Prerequisites

Amazon Connect instance
AWS CDK v2
For cross-account setups: An IAM role in the target account. See Cross Account Setup documentation

Installation

Install the construct library in your CDK project directory:

TypeScript/JavaScript

npm install @cdklabs/cdk-construct-connect-datalake

Python

pip install cdklabs.cdk-construct-connect-datalake

Java

Add the following dependency to your pom.xml:

<dependency>
  <groupId>io.github.cdklabs</groupId>
  <artifactId>cdk-construct-connect-datalake</artifactId>
  <version>VERSION</version>
</dependency>

.NET

dotnet add package Cdklabs.CdkConstructConnectDatalake

go get github.com/cdklabs/cdk-construct-connect-datalake-go/cdkconstructconnectdatalake

Basic Usage

Add the DataLakeAccess construct to a CDK stack deployed in the same AWS account and region as your Amazon Connect instance.

from cdklabs.cdk_construct_connect_datalake import DataLakeAccess, DataType


DataLakeAccess(self, "DataLakeAccess",
    instance_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",  # Your Connect instance ID
    dataset_ids=[DataType.CONTACT_RECORD, "contact_statistic_record"
    ]
)

Important: When deploying alongside a Connect instance in the same stack, add a dependency to the construct:

Example

from cdklabs.cdk_construct_connect_datalake import DataLakeAccess, DataType
from aws_cdk.aws_connect import CfnInstance


connect_instance = CfnInstance(self, "ConnectInstance",
    identity_management_type="CONNECT_MANAGED",
    instance_alias="my-instance",
    attributes=CfnInstance.AttributesProperty(
        inbound_calls=True,
        outbound_calls=True
    )
)

data_lake = DataLakeAccess(self, "DataLakeAccess",
    instance_id=connect_instance.attr_id,
    dataset_ids=[DataType.CONTACT_RECORD]
)

# Ensure data lake resources are deleted before the Connect instance
data_lake.node.add_dependency(connect_instance)

Cross-Account Configuration

Configure the construct to create data lake resources in a different AWS account by specifying targetAccountId and targetAccountRoleArn. The construct assumes the target role to accept the RAM resource share(s) and create Glue resources in that account.

from cdklabs.cdk_construct_connect_datalake import DataLakeAccess, DataType


DataLakeAccess(self, "DataLakeAccess",
    instance_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    dataset_ids=[DataType.CONTACT_RECORD, "contact_statistic_record"
    ],

    # Target account where the resources are created
    target_account_id="123456789012",

    # IAM role in the target account for cross-account role assumption
    target_account_role_arn="arn:aws:iam::123456789012:role/RoleName"
)

Multiple Instances

Enable data lake access for multiple Connect instances by creating a separate construct for each. A dependency should be added between them to ensure sequential deployment, preventing conflicts from concurrent operations.

from cdklabs.cdk_construct_connect_datalake import DataLakeAccess, DataType


# First Connect instance data lake setup
data_lake1 = DataLakeAccess(self, "DataLakeAccess1",
    instance_id="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    dataset_ids=[DataType.CONTACT_RECORD, DataType.AGENT_STATISTIC_RECORD
    ]
)

# Second Connect instance data lake setup
data_lake2 = DataLakeAccess(self, "DataLakeAccess2",
    instance_id="yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy",
    dataset_ids=[DataType.CONTACT_RECORD, DataType.CONTACT_FLOW_EVENTS
    ]
)

# Create dependency to ensure sequential deployment
data_lake2.node.add_dependency(data_lake1)

API Reference

DataLakeAccess

The main construct class for setting up Amazon Connect Data Lake integration.

Properties:

instanceId (string): Amazon Connect instance ID
datasetIds (Array<string | DataType>): Array of dataset IDs to associate. Use DataType enum values or string literals for datasets not yet in the enum.
targetAccountId? (string): Target AWS account ID receiving resources (optional)
targetAccountRoleArn? (string): IAM role ARN in the target account for cross-account role assumption (optional)

DataType Enum

For a list of supported dataset types, see the API Documentation.

Resources Created

This construct creates the following AWS resources:

Infrastructure Components

CloudFormation Custom Resource Provider: Framework for managing custom resource lifecycle
Lambda Function: Custom resource handler that orchestrates the data lake setup
IAM Role: Execution role with permissions for Connect, RAM, Glue, and Lake Formation operations
Show IAM permissions
- connect:BatchAssociateAnalyticsDataSet
- connect:AssociateAnalyticsDataSet
- connect:BatchDisassociateAnalyticsDataSet
- connect:DisassociateAnalyticsDataSet
- connect:ListAnalyticsDataAssociations
- connect:ListAnalyticsDataLakeDataSets
- connect:ListInstances
- ds:DescribeDirectories
- ram:AcceptResourceShareInvitation
- ram:GetResourceShareInvitations
- ram:GetResourceShares
- glue:CreateDatabase
- glue:CreateTable
- glue:DeleteDatabase
- glue:DeleteTable
- glue:GetDatabase
- glue:GetTables
- lakeformation:GetDataLakeSettings
- lakeformation:PutDataLakeSettings
- cloudformation:DescribeStacks
- sts:AssumeRole (for cross-account setups only)

Deployment Workflow

The construct performs the following steps during deployment:

Deployment Workflow

Dataset Association: Associates the specified datasets for an Amazon Connect instance with the target account
Database Creation: Creates the connect_datalake_database Glue database
Lake Formation Setup: Configures the Lambda execution role (or assumed role for cross-account) as a data lake administrator
Resource Share Acceptance: Accepts the RAM resource share invitation(s). Multiple dataset associations often consolidate into a single RAM resource share
Table Creation: Creates resource link tables for each dataset, enabling queries via Amazon Athena

When deploying to the same account as the Connect instance, all steps execute within that account. For cross-account configurations, steps 2-5 execute in the target account.

Limitations

Table Naming: Resource link tables created by this construct are named using the format {datasetId}_{dataCatalogId}
Region Support: The construct must be deployed in the same AWS region and account as the Amazon Connect instance. For cross-account configurations, resources are created in the target account within the same region
Shared Database: The connect_datalake_database Glue database is shared across all deployments of this construct in an account

Troubleshooting

Partial failures during deployment

If some workflow steps fail during create or update operations, the stack deployment will still show as successful. Error details for these partial failures are available in the CloudFormation stack outputs.

RAM resource share has expired

Resource shares for new dataset associations can consolidate into existing AWS RAM shares, even if expired. Delete each construct that references the target account, confirm the associated resources are removed, then redeploy using the original construct definitions.

Failure to update Lake Formation permissions due to invalid principal

IAM roles that have been deleted but not removed from Lake Formation principals will be considered invalid. Remove the principal causing this error from Lake Formation and redeploy the construct.

Resources are unable to be removed after a Connect instance has been deleted

Constructs of this type must be deleted prior to deleting the instance, as cleanup after instance deletion is currently not supported. A GitHub issue can be raised if assistance removing these resources is required.

Support

For issues and questions:

Reference the documentation for the analytics data lake in the Amazon Connect Administrator Guide
Check the API Documentation
Report bugs via GitHub Issues

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the Apache-2.0 License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cdklabs-automation

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved
Operating System
- OS Independent
Programming Language
Typing
- Typed

Release history Release notifications | RSS feed

0.0.10

May 18, 2026

This version

0.0.9

May 11, 2026

0.0.8

May 4, 2026

0.0.7

Apr 27, 2026

0.0.6

Apr 20, 2026

0.0.5

Apr 13, 2026

0.0.4

Apr 6, 2026

0.0.3

Mar 30, 2026

0.0.2

Mar 23, 2026

0.0.1

Mar 11, 2026

0.0.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz (16.5 MB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl (16.5 MB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz.

File metadata

Download URL: cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz
Upload date: May 11, 2026
Size: 16.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.14.4

File hashes

Hashes for cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz
Algorithm	Hash digest
SHA256	`9845accfb8ddcb569295daa9e51b0c34d59a13e76a26596e2fd39a95f23bf987`
MD5	`9ae03ba600669d1eac5345a4d216f2ff`
BLAKE2b-256	`dce151fb4d56bd61fc31be15e10ba71c05498b027489ae9753ec673b2b435aae`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz:

Publisher: release.yml on cdklabs/cdk-construct-connect-datalake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cdklabs_cdk_construct_connect_datalake-0.0.9.tar.gz
- Subject digest: 9845accfb8ddcb569295daa9e51b0c34d59a13e76a26596e2fd39a95f23bf987
- Sigstore transparency entry: 1510614338
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: cdklabs/cdk-construct-connect-datalake@67ea82d5c63bfb9150be668b2f7f922aa2ce14f1
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cdklabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@67ea82d5c63bfb9150be668b2f7f922aa2ce14f1
- Trigger Event: push

File details

Details for the file cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl.

File metadata

Download URL: cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl
Upload date: May 11, 2026
Size: 16.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.14.4

File hashes

Hashes for cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e23d4c96f5959697f05e95b9d2b7668a45d05cf6f056c06dd9bc24b29f3bf71`
MD5	`cbe5df0102828bedd3a66e6ca46f7ef2`
BLAKE2b-256	`2c702befa122f1cee979201250d2346b2edafc5de603cca6c915c2746c1f8025`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl:

Publisher: release.yml on cdklabs/cdk-construct-connect-datalake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cdklabs_cdk_construct_connect_datalake-0.0.9-py3-none-any.whl
- Subject digest: 8e23d4c96f5959697f05e95b9d2b7668a45d05cf6f056c06dd9bc24b29f3bf71
- Sigstore transparency entry: 1510614200
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: cdklabs/cdk-construct-connect-datalake@67ea82d5c63bfb9150be668b2f7f922aa2ce14f1
- Branch / Tag: refs/heads/main
- Owner: https://github.com/cdklabs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@67ea82d5c63bfb9150be668b2f7f922aa2ce14f1
- Trigger Event: push

cdklabs.cdk-construct-connect-datalake 0.0.9

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Amazon Connect Data Lake CDK Construct

Usage

Prerequisites

Installation

Basic Usage

Cross-Account Configuration

Multiple Instances

API Reference

DataLakeAccess

DataType Enum

Resources Created

Infrastructure Components

Deployment Workflow

Limitations

Troubleshooting

Support

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance