AWS CDK L2 construct to create and evolve Apache Iceberg tables in the AWS Glue Data Catalog — schema and partition changes that survive CloudFormation updates, via cdk deploy.
Project description
cdk-glue-iceberg-table
cdk-glue-iceberg-table is the AWS CDK L2 construct for creating
and evolving Apache Iceberg tables in the AWS Glue Data Catalog. It
emits the exact AWS::Glue::Table / OpenTableFormatInput shape that
survives a CloudFormation Update, so a single cdk deploy creates a
table, evolves its schema and partitions, and destroys it, the same
lifecycle CDK gives any other resource. No custom resource, no Lambda,
no "cdk deploy and then run this SQL by hand" two-step. Table
changes (new columns, renames, drops, new partition fields) land as
a reviewed diff in a pull request and apply through cdk deploy.
Status (June 2026): published on npm (TypeScript/JavaScript) and PyPI
(Python), pre-1.0, with a public surface guarded by an end-to-end
consumer test on every PR plus three real-AWS integration suites. The
multi-language packages are generated from one TypeScript source with
jsii and indexed on
Construct Hub.
Developed in the
ksco92/arceus monorepo, which
also holds a CDK demo app that dogfoods the construct against a real
AWS account.
Why this exists
AWS documents the CloudFormation shape for Iceberg tables (AWS Big
Data blog, December 2025),
but raw AWS::Glue::Table is a minefield: the CREATE succeeds and the
first Update silently strips table_type=ICEBERG, after which
every Athena query fails with HIVE_UNSUPPORTED_FORMAT. The
motivating CDK issue is aws/aws-cdk#29660;
manmartgarc's comment
documents the only working shape and the silent-corruption traps you
hit by getting it slightly wrong. This construct implements that
shape, refuses to emit the unsafe alternatives, and pins Iceberg field
IDs so schema evolution never orphans existing data. It is the basis
of the upstream proposal to land an IcebergTable in
@aws-cdk/aws-glue-alpha (aws/aws-cdk#37988);
this package tracks that proposal and stays current with it.
How it compares
| Approach | Declarative IaC (in the PR) | In-place schema + partition evolution | Prevents the silent-corruption footguns | Typed API + synth-time validation |
|---|---|---|---|---|
Raw CfnTable (L1) |
✅ | ⚠️ only if you hand-write the exact OpenTableFormatInput shape |
❌ you own both footguns | ❌ |
| CDK custom resource (Lambda + Glue SDK) | ⚠️ via a custom resource you maintain | ⚠️ you write the diff logic | ⚠️ your code's responsibility | ❌ |
| Spark / Athena SQL DDL at job runtime | ❌ imperative, outside the PR | ✅ but outside CloudFormation | n/a | ❌ |
cdk-glue-iceberg-table |
✅ | ✅ via cdk deploy |
✅ both prevented by construction | ✅ |
Install
TypeScript / JavaScript (npm):
npm install cdk-glue-iceberg-table
Python (PyPI):
pip install cdk-glue-iceberg-table
Peer dependencies (your CDK app must already have these). For TypeScript / JavaScript:
npm install aws-cdk-lib constructs @aws-cdk/aws-glue-alpha
For Python:
pip install aws-cdk-lib constructs aws-cdk.aws-glue-alpha
Use
TypeScript
import { Bucket } from 'aws-cdk-lib/aws-s3';
import { Database } from '@aws-cdk/aws-glue-alpha';
import {
IcebergTable,
IcebergType,
IcebergPartitionTransform,
} from 'cdk-glue-iceberg-table';
const bucket = new Bucket(this, 'Warehouse');
const db = new Database(this, 'Db', { databaseName: 'analytics' });
new IcebergTable(this, 'OrdersTable', {
database: db,
tableName: 'orders',
location: `s3://${bucket.bucketName}/analytics/orders/`,
columns: [
{ name: 'order_id', type: IcebergType.LONG, required: true, id: 1 },
{ name: 'customer_id', type: IcebergType.LONG, required: true, id: 2 },
{ name: 'placed_at', type: IcebergType.TIMESTAMPTZ, required: true, id: 3 },
],
partitionSpec: [
{ sourceColumn: 'placed_at', transform: IcebergPartitionTransform.DAY },
{ sourceColumn: 'customer_id', transform: IcebergPartitionTransform.bucket(16) },
],
identifierFieldNames: ['order_id'],
});
Python
The same table in Python — the API mirrors TypeScript with
snake_case props and PascalCase types:
from aws_cdk.aws_s3 import Bucket
from aws_cdk.aws_glue_alpha import Database
from cdk_glue_iceberg_table import (
IcebergTable,
IcebergType,
IcebergPartitionTransform,
)
bucket = Bucket(self, "Warehouse")
db = Database(self, "Db", database_name="analytics")
IcebergTable(self, "OrdersTable",
database=db,
table_name="orders",
location=f"s3://{bucket.bucket_name}/analytics/orders/",
columns=[
{"name": "order_id", "type": IcebergType.LONG, "required": True, "id": 1},
{"name": "customer_id", "type": IcebergType.LONG, "required": True, "id": 2},
{"name": "placed_at", "type": IcebergType.TIMESTAMPTZ, "required": True, "id": 3},
],
partition_spec=[
{"source_column": "placed_at", "transform": IcebergPartitionTransform.DAY},
{"source_column": "customer_id", "transform": IcebergPartitionTransform.bucket(16)},
],
identifier_field_names=["order_id"],
)
Consumer-facing reference sections:
- How it compares —
cdk-glue-iceberg-tablevs rawCfnTable, custom resources, and runtime SQL DDL. - Using
IcebergTable— full API reference with examples. - Two footguns the construct prevents — the silent-corruption traps that motivated this construct.
- Known limitations — what the construct does and doesn't enforce.
- FAQ — common "how do I … in CDK / CloudFormation" questions.
Exported surface
The package entry point re-exports everything you import from
cdk-glue-iceberg-table. In TypeScript / JavaScript the compiled
surface lives under dist/lib/iceberg/ after npm install; in Python
the same surface is imported from the cdk_glue_iceberg_table module
after pip install:
IcebergTable— the L2 construct itself, withgrantRead/grantWrite/grantReadWriteand thefromIcebergTableAttributes(...)import factory.IcebergType— primitive types pluslist/map/struct/decimal/fixedfactories. Renders to the JSON shape Glue'sIcebergStructField.typeexpects.IcebergPartitionTransform— identity /bucket(N)/truncate(W)/ year / month / day / hour / void. Each transform validates against the source column type at synth time.IcebergDataFormat(parquet/orc/avro, default parquet),IcebergFormatVersion(v1/v2, required — set explicitly per table),IcebergSortDirection,IcebergNullOrder, plus atablePropertiesvalidator that catches misconfigured properties before they leave your machine (wrong codec for the chosen format,merge-on-readon a v1 table, non-positive numeric values, …).
Using IcebergTable
A minimal table:
import {
Database,
} from '@aws-cdk/aws-glue-alpha';
import {
IcebergTable,
IcebergType,
} from 'cdk-glue-iceberg-table';
const db = new Database(this, 'Db', {
databaseName: 'analytics',
});
new IcebergTable(this, 'Users', {
database: db,
tableName: 'users',
columns: [
{
name: 'user_id',
type: IcebergType.LONG,
required: true,
id: 1,
},
{
name: 'email',
type: IcebergType.STRING,
required: true,
id: 2,
},
{
name: 'signed_up_at',
type: IcebergType.TIMESTAMPTZ,
required: true,
id: 3,
},
],
location: `s3://${bucket.bucketName}/analytics/users/`,
});
A table that exercises most of the surface (partitions, sort order, nested types, identifier fields, table properties, removal policy):
import {
RemovalPolicy,
} from 'aws-cdk-lib';
import {
Database,
} from '@aws-cdk/aws-glue-alpha';
import {
IcebergDataFormat,
IcebergFormatVersion,
IcebergNullOrder,
IcebergPartitionTransform,
IcebergSortDirection,
IcebergTable,
IcebergType,
} from 'cdk-glue-iceberg-table';
new IcebergTable(this, 'OrdersTable', {
database: db,
tableName: 'orders',
comment: 'Demo Iceberg orders table — exercises partitions, sort order, and merge-on-read.',
columns: [
{
name: 'order_id',
type: IcebergType.LONG,
required: true,
id: 1,
},
{
name: 'customer_id',
type: IcebergType.LONG,
required: true,
id: 2,
},
{
name: 'order_amount',
type: IcebergType.decimal(12, 2),
required: true,
id: 3,
},
{
name: 'currency',
type: IcebergType.STRING,
required: true,
id: 4,
},
{
name: 'placed_at',
type: IcebergType.TIMESTAMPTZ,
required: true,
id: 5,
},
{
name: 'tags',
type: IcebergType.list(IcebergType.STRING),
id: 6,
},
{
name: 'shipping_address',
type: IcebergType.struct([
{
name: 'line1',
type: IcebergType.STRING,
required: true,
},
{
name: 'city',
type: IcebergType.STRING,
required: true,
},
{
name: 'country',
type: IcebergType.STRING,
required: true,
},
{
name: 'postal_code',
type: IcebergType.STRING,
},
]),
id: 7,
},
{
name: 'metadata',
type: IcebergType.map(IcebergType.STRING, IcebergType.STRING, false),
id: 8,
},
],
location: `s3://${bucket.bucketName}/analytics/orders/`,
partitionSpec: [
{
sourceColumn: 'placed_at',
transform: IcebergPartitionTransform.DAY,
},
{
sourceColumn: 'customer_id',
transform: IcebergPartitionTransform.bucket(16),
},
],
sortOrder: [
{
sourceColumn: 'placed_at',
direction: IcebergSortDirection.ASC,
nullOrder: IcebergNullOrder.NULLS_LAST,
},
{
sourceColumn: 'order_id',
direction: IcebergSortDirection.ASC,
},
],
identifierFieldNames: [
'order_id',
],
dataFormat: IcebergDataFormat.PARQUET,
formatVersion: IcebergFormatVersion.V2,
tableProperties: {
'write.parquet.compression-codec': 'zstd',
'write.delete.mode': 'merge-on-read',
'write.update.mode': 'merge-on-read',
'write.merge.mode': 'merge-on-read',
'write.target-file-size-bytes': '134217728',
'history.expire.min-snapshots-to-keep': '5',
'gc.enabled': 'true',
},
removalPolicy: RemovalPolicy.DESTROY,
});
The resulting Iceberg metadata.json for this table contains every
feature you set:
{
"format-version": 2,
"table-uuid": "39a948f9-...",
"current-schema-id": 0,
"schemas": [
{
"schema-id": 0,
"identifier-field-ids": [1],
"fields": [
{ "id": 1, "name": "order_id", "required": true, "type": "long" },
{ "id": 2, "name": "customer_id", "required": true, "type": "long" },
{ "id": 3, "name": "order_amount", "required": true, "type": "decimal(12, 2)" },
{ "id": 4, "name": "currency", "required": true, "type": "string" },
{ "id": 5, "name": "placed_at", "required": true, "type": "timestamptz" },
{ "id": 6, "name": "tags", "required": false,
"type": { "type": "list", "element-id": 9, "element": "string", "element-required": true } },
{ "id": 7, "name": "shipping_address", "required": false,
"type": { "type": "struct", "fields": [
{ "id": 10, "name": "line1", "required": true, "type": "string" },
{ "id": 11, "name": "city", "required": true, "type": "string" },
{ "id": 12, "name": "country", "required": true, "type": "string" },
{ "id": 13, "name": "postal_code", "required": false, "type": "string" }
] } },
{ "id": 8, "name": "metadata", "required": false,
"type": { "type": "map", "key-id": 14, "key": "string", "value-id": 15,
"value-required": false, "value": "string" } }
]
}
],
"partition-specs": [
{ "spec-id": 0, "fields": [
{ "name": "placed_at_day", "transform": "day", "source-id": 5, "field-id": 1000 },
{ "name": "customer_id_bucket", "transform": "bucket[16]", "source-id": 2, "field-id": 1001 }
]}
],
"sort-orders": [
{ "order-id": 1, "fields": [
{ "transform": "identity", "source-id": 5, "direction": "asc", "null-order": "nulls-last" },
{ "transform": "identity", "source-id": 1, "direction": "asc", "null-order": "nulls-last" }
]}
],
"properties": {
"format-version": "2",
"write.format.default": "parquet",
"write.parquet.compression-codec": "zstd",
"write.merge.mode": "merge-on-read",
"write.update.mode": "merge-on-read",
"write.delete.mode": "merge-on-read",
"write.target-file-size-bytes": "134217728",
"history.expire.min-snapshots-to-keep": "5",
"gc.enabled": "true",
"comment": "Demo Iceberg orders table — exercises partitions, sort order, and merge-on-read."
}
}
Granting access
table.grantRead(role); // Glue read + S3 read on the table's prefix
table.grantWrite(role); // Glue write + S3 write
table.grantReadWrite(role);
The grant* helpers issue IAM grants only. Under Lake Formation
you still add the matching SELECT / INSERT / DELETE LF grants on
top of the construct's IAM grants for Athena queries to succeed.
Importing an existing table
const existing = IcebergTable.fromIcebergTableAttributes(this, 'Orders', {
database: db,
tableName: 'orders',
location: 's3://my-bucket/analytics/orders/',
});
existing.grantRead(role);
Evolving schema and partitions
Change the columns array (or partitionSpec) and run cdk deploy
again. The construct passes the new schema to Glue's UpdateTable,
which writes a new metadata.json with a new schema-id; existing
data files stay readable because each column's id is pinned and
never reused. Adds, renames (same id, new name), and drops all
flow through cdk deploy alone — no out-of-band SQL DDL. The same
applies to partitionSpec.
Inserting and querying
-- INSERT into the orders table
INSERT INTO sample_database.orders VALUES
(1001, 5001, DECIMAL '149.99', 'USD',
TIMESTAMP '2026-05-20 09:15:00 UTC',
ARRAY['holiday-promo','first-order'],
CAST(ROW('1 Infinite Loop','Cupertino','US','95014')
AS ROW(line1 VARCHAR,city VARCHAR,country VARCHAR,postal_code VARCHAR)),
MAP(ARRAY['channel','utm'], ARRAY['web','google'])),
-- ... more rows
;
-- merge-on-read DELETE (only legal because we chose v2 + merge-on-read mode)
DELETE FROM sample_database.orders WHERE order_id = 1003;
-- merge-on-read UPDATE
UPDATE sample_database.orders SET currency = 'GBP' WHERE customer_id = 5002;
-- SELECT
SELECT customer_id, SUM(order_amount) AS total
FROM sample_database.orders
GROUP BY 1
ORDER BY 2 DESC;
Two footguns the construct prevents
Footgun #1 — schema under storageDescriptor.columns
The CREATE succeeds but the first UPDATE silently strips
table_type=ICEBERG from the table's Glue parameters, and Athena
queries after that fail with HIVE_UNSUPPORTED_FORMAT.
// DON'T DO THIS — what most StackOverflow / re:Post examples show
new CfnTable(this, 'OrdersBad', {
catalogId: this.account,
databaseName: 'analytics',
tableInput: {
name: 'orders',
tableType: 'EXTERNAL_TABLE',
parameters: {
table_type: 'ICEBERG',
},
storageDescriptor: {
location: 's3://.../orders/',
columns: [
/* ... */
],
},
},
openTableFormatInput: {
icebergInput: {
metadataOperation: 'CREATE',
version: '2',
},
},
});
IcebergTable instead always emits schema/partitions/sort/properties
under openTableFormatInput.icebergInput.icebergTableInput, never
under storageDescriptor.
Footgun #2 — tableInput co-present with openTableFormatInput
Even setting just tableInput: { name: 'foo' } next to
openTableFormatInput returns
"Table metadata is expected only via TableInput or via IcebergTableInputProperties inside OpenTableFormatInput".
The construct never emits tableInput; the table-level comment goes
into tableProperties['comment'], which lives inside
icebergTableInput.properties.
(There is a third footgun, field-id reuse after a column drop, that the construct does not prevent. See the next section.)
Known limitations
- Field-id reuse is not detected across deploys. If you drop a column with
id = 5and then add a different column withid = 5in a later deploy, Glue accepts the UPDATE and Iceberg's metadata silently violates the "never reuse a retired id" invariant. Readers projecting old snapshots will surface deleted data under the new field's name. The construct enforces uniqueness within one deploy (duplicate column id Nvalidator), but it doesn't compare against the live table state. The safe workflow is to always pinidexplicitly and treat dropped ids as retired forever; never let CDK reassign an id that has ever been used. - Partition field ids are positional and not pinnable. The construct allocates partition
fieldIddensely from 1000 in the order partitions appear inpartitionSpec. Reordering the array across deploys reassigns those ids for unchanged logical partitions, which is the partition-spec analog of the column-id-reuse footgun above. There is noIcebergPartitionField.fieldIdpinning prop today. The safe workflow is append-only: add new partition fields at the end ofpartitionSpec, and only drop the trailing ones. - CREATE-only metadata operation. The CFN
IcebergInput.metadataOperationonly acceptsCREATE; the construct always emits that. Subsequent deploys use Glue's normalUpdateTablepath, which writes new Iceberg metadata in-place. - Format version is immutable after CREATE. The
formatVersionprop is read once at table creation; changing it later requires a destroy + recreate. merge-on-readrequires v2. The construct rejectswrite.{delete,update,merge}.mode = merge-on-readon a v1 table at synth time.- Athena DDL features that don't surface through CFN (e.g.
ALTER TABLE WRITE ORDERED BY,ALTER TABLE … SET LOCATION,bucketed_by/bucket_countHive clauses) are not exposed. UseIcebergPartitionTransform.bucket(N)instead of Hive bucketing. - Dropping a partition column requires a
voidintermediate per the Iceberg spec, and the CFNOpenTableFormatInputcannot express that. The construct accepts the change, but Athena queries against the result will fail withType cannot be null. The safe pattern is to drop partitions that source from a column while keeping that column in the schema, and only drop a column once it is no longer partitioning anything.
FAQ
How do I create an Apache Iceberg table in AWS CDK?
Install cdk-glue-iceberg-table, then declare an IcebergTable with a
database, a column list, and an S3 location — see Use.
cdk deploy creates the Glue Data Catalog table and writes the
Iceberg metadata.json under your S3 prefix. No custom resource or
Lambda is involved; it is a plain AWS::Glue::Table in your
CloudFormation template.
How do I evolve an Iceberg table schema (add, rename, or drop a column) in CloudFormation?
Change the columns array and run cdk deploy again. The construct
passes the new schema to Glue's UpdateTable, which writes a new
metadata.json with a new schema-id; existing data files stay
readable because each column's id is pinned and never reused. The
same applies to partitionSpec. See Evolving schema and
partitions.
Does CloudFormation support Iceberg tables natively?
Yes, through AWS::Glue::Table with OpenTableFormatInput.IcebergInput,
documented by AWS in December 2025.
The catch is that the raw shape corrupts the table on the first
Update if you place schema under storageDescriptor.columns or set
tableInput alongside openTableFormatInput. This construct only ever
emits the safe shape and gives you no way to express the unsafe ones.
See Two footguns the construct prevents.
What is the difference between cdk-glue-iceberg-table and a raw CfnTable?
A raw CfnTable makes you hand-write the OpenTableFormatInput JSON
and own both silent-corruption footguns. cdk-glue-iceberg-table gives
you a typed IcebergType / IcebergPartitionTransform API,
synth-time validation (partition transforms checked against column
types, format/version mismatches caught before deploy), pinned field
IDs for safe evolution, and grantRead / grantWrite helpers. See
How it compares.
How do I create a partitioned Iceberg table in CDK?
Pass a partitionSpec of IcebergPartitionTransform entries —
identity, bucket(N), truncate(W), year, month, day, hour,
or void. Each transform is validated against its source column's type
at synth time. See Using IcebergTable.
Does it work with Athena and Lake Formation?
Yes. The demo app in the arceus
repo registers the tables with Lake Formation and queries them from
Athena, including v2 merge-on-read INSERT / UPDATE / DELETE /
MERGE, time travel, OPTIMIZE, and VACUUM. The construct's
grant* helpers issue IAM grants; under Lake Formation you still add
the matching SELECT / INSERT LF grants. See Granting
access.
Can I use Iceberg v2 merge-on-read (row-level UPDATE and DELETE)?
Yes — set formatVersion: IcebergFormatVersion.V2 and the
write.{delete,update,merge}.mode = merge-on-read table properties.
The construct rejects merge-on-read on a v1 table at synth time.
Contributing
Development, the monorepo layout, and the demo app live in CONTRIBUTING.md.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cdk_glue_iceberg_table-0.4.2.tar.gz.
File metadata
- Download URL: cdk_glue_iceberg_table-0.4.2.tar.gz
- Upload date:
- Size: 141.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
939c3a1e8762750cfa064789e52a32d64a48efc62c3e17c1de96cc22ae52d091
|
|
| MD5 |
0c58cdb9135e165a451eb4940c9862c7
|
|
| BLAKE2b-256 |
1d29f7db21bd9e644ad6c8553649fcca9c5b5370e80bde381c33641a36dad315
|
Provenance
The following attestation bundles were made for cdk_glue_iceberg_table-0.4.2.tar.gz:
Publisher:
publish.yml on ksco92/arceus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cdk_glue_iceberg_table-0.4.2.tar.gz -
Subject digest:
939c3a1e8762750cfa064789e52a32d64a48efc62c3e17c1de96cc22ae52d091 - Sigstore transparency entry: 1901766849
- Sigstore integration time:
-
Permalink:
ksco92/arceus@1821b6ecfb9e3a4175c74621a4f6949de80fc9f3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ksco92
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1821b6ecfb9e3a4175c74621a4f6949de80fc9f3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cdk_glue_iceberg_table-0.4.2-py3-none-any.whl.
File metadata
- Download URL: cdk_glue_iceberg_table-0.4.2-py3-none-any.whl
- Upload date:
- Size: 132.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19230d3bf1e040a296d396de476805da5687350a4bbc8f2e61dacd493be2c0cd
|
|
| MD5 |
5bee42de54fec65992e0f1fb95c4fe21
|
|
| BLAKE2b-256 |
f2e0498f37d8d23d7ae2752c87e2778aaedb95fd3894c3e2c10c632327afd23a
|
Provenance
The following attestation bundles were made for cdk_glue_iceberg_table-0.4.2-py3-none-any.whl:
Publisher:
publish.yml on ksco92/arceus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cdk_glue_iceberg_table-0.4.2-py3-none-any.whl -
Subject digest:
19230d3bf1e040a296d396de476805da5687350a4bbc8f2e61dacd493be2c0cd - Sigstore transparency entry: 1901767030
- Sigstore integration time:
-
Permalink:
ksco92/arceus@1821b6ecfb9e3a4175c74621a4f6949de80fc9f3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ksco92
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1821b6ecfb9e3a4175c74621a4f6949de80fc9f3 -
Trigger Event:
push
-
Statement type: