Skip to main content

AWS CDK L3 construct that creates a complete zero-ETL integration from Amazon DynamoDB to Amazon S3 Tables (Apache Iceberg)

Project description

dynamodb-zero-etl-s3tables

npm version PyPI version NuGet version License: MIT jsii stability: experimental

An AWS CDK L3 construct that wires up a complete zero-ETL integration from Amazon DynamoDB to Amazon S3 Tables (Apache Iceberg) — in a single line of code.

Zero-ETL eliminates the need to build and maintain ETL pipelines. Data flows automatically from your DynamoDB table into Iceberg tables on S3, ready for analytics with Athena, Redshift, EMR, and more.

Why this construct?

Setting up DynamoDB zero-ETL to S3 Tables manually requires 7+ resources across DynamoDB, S3 Tables, IAM, Glue, and custom resources — each with specific permissions, dependencies, and ordering constraints. One misconfigured policy and the integration silently fails.

This construct handles all of that for you:

┌──────────────┐         ┌──────────────────┐         ┌─────────────────┐
│              │         │                  │         │                 │
│   DynamoDB   │────────▶│  AWS Glue        │────────▶│  S3 Tables      │
│   Table      │  zero   │  Integration     │  write  │  (Iceberg)      │
│              │  ETL    │                  │         │                 │
└──────────────┘         └──────────────────┘         └─────────────────┘
       │                        │                            │
       ▼                        ▼                            ▼
  Resource Policy          Catalog Policy              Table Bucket
  (Glue export)            (Custom Resource)           IAM Target Role

What gets created:

Resource Purpose
AWS::S3Tables::TableBucket Iceberg-native storage for your analytics data
AWS::IAM::Role Least-privilege role for Glue to write to S3 Tables and catalog
AWS::Glue::Integration The zero-ETL integration connecting source to target
AWS::Glue::IntegrationResourceProperty Wires the target IAM role to the integration
Custom::AWS (AwsCustomResource) Sets the Glue Data Catalog resource policy (no CloudFormation support)
DynamoDB Resource Policy Allows Glue to export and describe the source table

Installation

TypeScript/JavaScript:

npm install dynamodb-zero-etl-s3tables

Python:

pip install dynamodb-zero-etl-s3tables

Java (Maven):

<dependency>
    <groupId>io.github.leeroyhannigan</groupId>
    <artifactId>dynamodb-zero-etl-s3tables</artifactId>
</dependency>

.NET:

dotnet add package LeeroyHannigan.CDK.DynamoDbZeroEtlS3Tables

Quick Start

import { DynamoDbZeroEtlToS3Tables } from 'dynamodb-zero-etl-s3tables';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

const table = new dynamodb.Table(this, 'Table', {
  tableName: 'Orders',
  partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
  sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
  pointInTimeRecovery: true,
});

new DynamoDbZeroEtlToS3Tables(this, 'ZeroEtl', {
  table,
  tableBucketName: 'orders-iceberg-bucket',
});

That's it. Your DynamoDB data will automatically replicate to Iceberg tables on S3.

Props

Property Type Required Default Description
table dynamodb.Table Yes DynamoDB table with an explicit tableName and PITR enabled
tableBucketName string Yes Name for the S3 Table Bucket
integrationName string No 'ddb-to-s3tables' Name for the Glue zero-ETL integration

Exposed Properties

All key resources are exposed as public properties for extension:

Property Type Description
tableBucket s3tables.CfnTableBucket The S3 Table Bucket for Iceberg storage
targetRole iam.Role The IAM role Glue uses to write to the target
integration glue.CfnIntegration The Glue zero-ETL integration

Customization Examples

Add custom permissions to the target role

const zeroEtl = new DynamoDbZeroEtlToS3Tables(this, 'ZeroEtl', {
  table,
  tableBucketName: 'my-bucket',
});

zeroEtl.targetRole.addToPolicy(new iam.PolicyStatement({
  actions: ['s3:GetObject'],
  resources: ['arn:aws:s3:::my-other-bucket/*'],
}));

Configure Iceberg file maintenance

zeroEtl.tableBucket.unreferencedFileRemoval = {
  status: 'Enabled',
  unreferencedDays: 10,
  noncurrentDays: 30,
};

Tag the integration

zeroEtl.integration.tags = [
  { key: 'Environment', value: 'production' },
  { key: 'Team', value: 'analytics' },
];

Prerequisites

Your DynamoDB table must have:

  1. An explicit tableName — auto-generated names (CloudFormation tokens) are not supported. The construct validates this at synth time.
  2. Point-in-time recovery (PITR) enabled — required by the zero-ETL integration for data export. The construct validates this at synth time.

If either requirement is not met, the construct throws a descriptive error during synthesis.

How It Works

  1. S3 Table Bucket is created as the Iceberg-native target for your data
  2. IAM Role is created with least-privilege permissions for S3 Tables, Glue Catalog, CloudWatch, and Logs
  3. DynamoDB Resource Policy is set on your table, allowing the Glue service to export data
  4. Glue Catalog Resource Policy is applied via a custom resource (CloudFormation doesn't support this natively)
  5. Integration Resource Property wires the IAM role to the target catalog
  6. Glue Integration is created, connecting your DynamoDB table to the S3 Tables catalog

All resources are created with correct dependency ordering to ensure a successful single-deploy experience.

Querying Your Data

Once the integration is active, your DynamoDB data is available as Iceberg tables. Query with Amazon Athena:

SELECT * FROM "s3tablescatalog/my-bucket"."namespace"."table_name" LIMIT 10;

Security

  • All IAM permissions follow least-privilege principles
  • S3 Tables permissions are scoped to the specific bucket and sub-resources
  • Glue catalog permissions are scoped to the account's catalog and databases
  • DynamoDB resource policy uses aws:SourceAccount and aws:SourceArn conditions
  • CloudWatch metrics are conditioned on the AWS/Glue/ZeroETL namespace

Contributing

Contributions, issues, and feature requests are welcome!

License

This project is licensed under the MIT License.

Author

Lee HanniganGitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamodb_zero_etl_s3tables-0.1.7.tar.gz (182.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynamodb_zero_etl_s3tables-0.1.7-py3-none-any.whl (181.2 kB view details)

Uploaded Python 3

File details

Details for the file dynamodb_zero_etl_s3tables-0.1.7.tar.gz.

File metadata

File hashes

Hashes for dynamodb_zero_etl_s3tables-0.1.7.tar.gz
Algorithm Hash digest
SHA256 be3d8632103550e5d1ad7ef68f8faa3886c7415ddcdd488292c2bda059e31f65
MD5 9a56d30d0860e28544922f1cc0cb23fa
BLAKE2b-256 b5c30092d1375b1cbb9825c96b98fed0d18f0a855ef566982848a0d9076b6afa

See more details on using hashes here.

Provenance

The following attestation bundles were made for dynamodb_zero_etl_s3tables-0.1.7.tar.gz:

Publisher: release.yml on LeeroyHannigan/dynamodb-zero-etl-s3tables

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dynamodb_zero_etl_s3tables-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for dynamodb_zero_etl_s3tables-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebfdc5d11a7ecb272cc1527b6f13825fb9b1a6d57b4fe902e8cee99a527d393
MD5 59bf0d9b65c7b79f7c74a2995069b551
BLAKE2b-256 4211e3954f55c13d896cfc80eb9f6249e53a98cecbb4cd055b02201c43c2b769

See more details on using hashes here.

Provenance

The following attestation bundles were made for dynamodb_zero_etl_s3tables-0.1.7-py3-none-any.whl:

Publisher: release.yml on LeeroyHannigan/dynamodb-zero-etl-s3tables

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page