dbt adapter for AWS Glue

These details have been verified by PyPI

Maintainers

mehdimld menuetb moomindani sugichy yotahk

These details have not been verified by PyPI

Project links

Homepage

Project description

dbt logo

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. dbt is the T in ELT. Organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it's ready for analysis.

dbt-glue

The dbt-glue package implements the dbt adapter protocol for AWS Glue's Spark engine. It supports running dbt against Spark, through the new Glue Interactive Sessions API.

To learn how to deploy a data pipeline in your modern data platform using the dbt-glue adapter, please read the following blog post: Build your data pipeline in your AWS modern data platform using AWS Lake Formation, AWS Glue, and dbt Core

Installation

The package can be installed from PyPI with:

$ pip3 install dbt-glue

For further (and more likely up-to-date) info, see the README

Connection Methods

Configuring your AWS profile for Glue Interactive Session

There are two IAM principals used with interactive sessions.

Client principal: The princpal (either user or role) calling the AWS APIs (Glue, Lake Formation, Interactive Sessions) from the local client. This is the principal configured in the AWS CLI and likely the same.
Service role: The IAM role that AWS Glue uses to execute your session. This is the same as AWS Glue ETL.

Read this documentation to configure these principals.

You will find bellow a least privileged policy to enjoy all features of dbt-glue adapter.

Please to update variables between <>, here are explanations of these arguments:

Args	Description
region	The region where your Glue database is stored
AWS Account	The AWS account where you run your pipeline
dbt output database	The database updated by dbt (this is the schema configured in the profile.yml of your dbt environment)
dbt source database	All databases used as source
dbt output bucket	The bucket name where the data will be generate dbt (the location configured in the profile.yml of your dbt environment)
dbt source bucket	The bucket name of source databases (if they are not managed by Lake Formation)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Read_and_write_databases",
            "Action": [
                "glue:SearchTables",
                "glue:BatchCreatePartition",
                "glue:CreatePartitionIndex",
                "glue:DeleteDatabase",
                "glue:GetTableVersions",
                "glue:GetPartitions",
                "glue:DeleteTableVersion",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:DeletePartitionIndex",
                "glue:GetTableVersion",
                "glue:UpdateColumnStatisticsForTable",
                "glue:CreatePartition",
                "glue:UpdateDatabase",
                "glue:CreateTable",
                "glue:GetTables",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetDatabase",
                "glue:GetPartition",
                "glue:UpdateColumnStatisticsForPartition",
                "glue:CreateDatabase",
                "glue:BatchDeleteTableVersion",
                "glue:BatchDeleteTable",
                "glue:DeletePartition",
                "glue:GetUserDefinedFunctions",
                "lakeformation:ListResources",
                "lakeformation:BatchGrantPermissions",
                "lakeformation:ListPermissions", 
                "lakeformation:GetDataAccess",
                "lakeformation:GrantPermissions",
                "lakeformation:RevokePermissions",
                "lakeformation:BatchRevokePermissions",
                "lakeformation:AddLFTagsToResource",
                "lakeformation:RemoveLFTagsFromResource",
                "lakeformation:GetResourceLFTags",
                "lakeformation:ListLFTags",
                "lakeformation:GetLFTag",
            ],
            "Resource": [
                "arn:aws:glue:<region>:<AWS Account>:catalog",
                "arn:aws:glue:<region>:<AWS Account>:table/<dbt output database>/*",
                "arn:aws:glue:<region>:<AWS Account>:database/<dbt output database>"
            ],
            "Effect": "Allow"
        },
        {
            "Sid": "Read_only_databases",
            "Action": [
                "glue:SearchTables",
                "glue:GetTableVersions",
                "glue:GetPartitions",
                "glue:GetTableVersion",
                "glue:GetTables",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetDatabase",
                "glue:GetPartition",
                "lakeformation:ListResources",
                "lakeformation:ListPermissions"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<AWS Account>:table/<dbt source database>/*",
                "arn:aws:glue:<region>:<AWS Account>:database/<dbt source database>",
                "arn:aws:glue:<region>:<AWS Account>:database/default",
                "arn:aws:glue:<region>:<AWS Account>:database/global_temp"
            ],
            "Effect": "Allow"
        },
        {
            "Sid": "Storage_all_buckets",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<dbt output bucket>",
                "arn:aws:s3:::<dbt source bucket>"
            ],
            "Effect": "Allow"
        },
        {
            "Sid": "Read_and_write_buckets",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<dbt output bucket>"
            ],
            "Effect": "Allow"
        },
        {
            "Sid": "Read_only_buckets",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<dbt source bucket>"
            ],
            "Effect": "Allow"
        }
    ]
}

Configuration of the local environment

Because dbt and dbt-glue adapter are compatible with Python versions 3.7, 3.8, and 3.9, check the version of Python:

$ python3 --version

Configure a Python virtual environment to isolate package version and code dependencies:

$ python3 -m venv dbt_venv
$ source dbt_venv/bin/activate
$ python3 -m pip install --upgrade pip

Configure the last version of AWS CLI

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

Install boto3 package

$ sudo yum install gcc krb5-devel.x86_64 python3-devel.x86_64 -y
$ pip3 install --upgrade boto3

Install the package:

$ pip3 install dbt-glue

Example config

type: glue
query-comment: This is a glue dbt example
role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
region: us-east-1
workers: 2
worker_type: G.1X
idle_timeout: 10
schema: "dbt_demo"
session_provisioning_timeout_in_seconds: 120
location: "s3://dbt_demo_bucket/dbt_demo_data"

The table below describes all the options.

Option	Description	Mandatory
project_name	The dbt project name. This must be the same as the one configured in the dbt project.	yes
type	The driver to use.	yes
query-comment	A string to inject as a comment in each query that dbt runs.	no
role_arn	The ARN of the glue interactive session IAM role.	yes
region	The AWS Region were you run the data pipeline.	yes
workers	The number of workers of a defined workerType that are allocated when a job runs.	yes
worker_type	The type of predefined worker that is allocated when a job runs. Accepts a value of Standard, G.1X, or G.2X.	yes
schema	The schema used to organize data stored in Amazon S3.Additionally, is the database in AWS Lake Formation that stores metadata tables in the Data Catalog.	yes
session_provisioning_timeout_in_seconds	The timeout in seconds for AWS Glue interactive session provisioning.	yes
location	The Amazon S3 location of your target data.	yes
query_timeout_in_minutes	The timeout in minutes for a signle query. Default is 300	no
idle_timeout	The AWS Glue session idle timeout in minutes. (The session stops after being idle for the specified amount of time)	no
glue_version	The version of AWS Glue for this session to use. Currently, the only valid options are 2.0, 3.0 and 4.0. The default value is 4.0.	no
security_configuration	The security configuration to use with this session.	no
connections	A comma-separated list of connections to use in the session.	no
conf	Specific configuration used at the startup of the Glue Interactive Session (arg --conf)	no
extra_py_files	Extra python Libs that can be used by the interactive session.	no
delta_athena_prefix	A prefix used to create Athena compatible tables for Delta tables (if not specified, then no Athena compatible table will be created)	no
tags	The map of key value pairs (tags) belonging to the session. Ex: `KeyName1=Value1,KeyName2=Value2`	no
seed_format	By default `parquet`, can be Spark format compatible like `csv` or `json`	no
seed_mode	By default `overwrite`, the seed data will be overwritten, you can set it to `append` if you just want to add new data in your dataset	no
default_arguments	The map of key value pairs parameters belonging to the session. More information on Job parameters used by AWS Glue. Ex: `--enable-continuous-cloudwatch-log=true,--enable-continuous-log-filter=true`	no
glue_session_id	re-use a glue-session to run multiple dbt run commands. Will create a new glue-session using glue_session_id if it does not exists yet.	no
glue_session_reuse	re-use the glue-session to run multiple dbt run commands: If set to true, the glue session will not be closed for re-use. If set to false, the session will be closed. The glue session will close after idle_timeout time is expired after idle_timeout time	no
group_session_id	[Model Level Meta Setting] Set a specific glue session suffix id to group sets of models together to a specific session id. Good for models that have chained dependencies in a larger dag and you want to save on session startup times.	no
datalake_formats	The ACID datalake format that you want to use if you are doing merge, can be `hudi`, `iceberg` or `delta`	no
use_arrow	(experimental) use an arrow file instead of stdout to have better scalability.	no
enable_spark_seed_casting	Allows spark to cast the columns depending on the specified model column types. Default `False`.	no

Configs

Configuring tables

When materializing a model as table, you may include several optional configs that are specific to the dbt-spark plugin, in addition to the standard model configs.

Option	Description	Required?	Example
file_format	The file format to use when creating tables (`parquet`, `csv`, `json`, `text`, `jdbc`, `orc`, `delta`, `iceberg`, `hudi`, or `s3tables`).	Optional	`parquet`
partition_by	Partition the created table by the specified columns. A directory is created for each partition.	Optional	`date_day`
clustered_by	Each partition in the created table will be split into a fixed number of buckets by the specified columns.	Optional	`country_code`
buckets	The number of buckets to create while clustering	Required if `clustered_by` is specified	`8`
custom_location	By default, the adapter will store your data in the following path: `location path`/`schema`/`table`. If you don't want to follow that default behaviour, you can use this parameter to set your own custom location on S3	No	`s3://mycustombucket/mycustompath`
hudi_options	When using file_format `hudi`, gives the ability to overwrite any of the default configuration options.	Optional	`{'hoodie.schema.on.read.enable': 'true'}`
meta	Spawns isolated Glue session with different session configuration. Use Case: When specific models require configurations different from the default session settings. For example, a particular model might require more Glue workers or larger worker type.	Optional	`meta = { "workers": 50, "worker_type": "G.1X" }`
add_iceberg_timestamp	Add `update_iceberg_ts` column on Iceberg tables. (default: false)	Optional	`true`
use_iceberg_temp_views	Use Spark temporary views when using Iceberg targets instead of physical tables to store intermediate results. (default: false)	Optional	`true`
purge_dropped_iceberg_data	Purge Iceberg and S3 Tables underlying data S3 object during drop table. (default: false)	Optional	`true`

Amazon S3 Tables Support (Experimental)

WARNING: Experimental Feature: Amazon S3 Tables support is currently experimental and may have limitations or breaking changes in future versions.

dbt-glue supports Amazon S3 Tables, a new table type for analytics workloads that provides Apache Iceberg compatibility with automatic optimization and management.

Configuration

To use S3 Tables, set file_format='s3tables' in your model configuration:

{{ config(
    materialized='table',
    file_format='s3tables'
) }}

select 
    id,
    name,
    created_at
from {{ ref('source_table') }}

Profile Configuration

S3 Tables require specific Spark configurations in your profiles.yml:

your_profile:
  target: dev
  outputs:
    dev:
      type: glue
      # ... other configurations ...
      conf: >-
        spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
        --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
        --conf spark.sql.defaultCatalog=glue_catalog
        --conf spark.sql.catalog.glue_catalog.warehouse=s3://your-warehouse-path
        --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
        --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
        --conf spark.sql.catalog.glue_catalog.glue.id=YOUR_ACCOUNT_ID:s3tablescatalog/your-s3-tables-bucket
      datalake_formats: iceberg

Example Model

{{ config(
    materialized='table',
    file_format='s3tables',
    partition_by=['year', 'month']
) }}

select 
    customer_id,
    order_date,
    extract(year from order_date) as year,
    extract(month from order_date) as month,
    total_amount
from {{ ref('raw_orders') }}
where order_date >= '2024-01-01'

Requirements

AWS Glue 4.0 or later (recommended)
S3 Tables bucket configured in your AWS account
Proper IAM permissions for S3 Tables operations
Iceberg Spark extensions configured in your profile

Python models (Experimental)

WARNING: Experimental Feature: Python model support is currently experimental and may have limitations or breaking changes in future versions.

dbt-glue supports Python models that allow you to apply transformations to your data using Python code and libraries, rather than SQL. This enables more complex data transformations, statistical analysis, and machine learning workflows within your dbt project.

Requirements

AWS Glue version 4.0 or later (recommended for best Python support)
Iceberg file format (required for Python models)
Proper Iceberg configuration in your profile (see configuration example below)

Configuration

Python models require Iceberg file format and specific Spark configurations. Add the following to your profiles.yml:

your_profile:
  target: dev
  outputs:
    dev:
      type: glue
      # ... other configurations ...
      datalake_formats: iceberg
      conf: >-
        --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
        --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
        --conf spark.sql.catalog.glue_catalog.warehouse=s3://your-warehouse-path
        --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
        --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO

Basic Usage

Create a Python model by adding a .py file to your models/ directory:

def model(dbt, spark):
    # Configure the model
    dbt.config(materialized='python_model', file_format='iceberg')
    
    # Create your DataFrame using Spark
    data = [
        (1, 'Alice', 100),
        (2, 'Bob', 200),
        (3, 'Charlie', 300)
    ]
    columns = ['id', 'name', 'value']
    
    # Return a Spark DataFrame
    return spark.createDataFrame(data, columns)

Referencing Other Models

You can reference other dbt models and sources in your Python models:

def model(dbt, spark):
    dbt.config(materialized='python_model', file_format='iceberg')
    
    # Reference another dbt model
    customers_df = dbt.ref('customers')
    
    # Reference a source
    orders_df = dbt.source('raw_data', 'orders')
    
    # Perform transformations
    result_df = customers_df.join(orders_df, 'customer_id')
    
    return result_df

Incremental Python Models

Python models support incremental materialization with merge strategy:

def model(dbt, spark):
    dbt.config(
        materialized='incremental',
        file_format='iceberg',
        incremental_strategy='merge',
        unique_key='id'
    )
    
    # Get source data
    source_df = dbt.ref('raw_events')
    
    if dbt.is_incremental():
        # Only process new records for incremental runs
        max_date = spark.sql(f"SELECT MAX(event_date) FROM {dbt.this}").collect()[0][0]
        source_df = source_df.filter(source_df.event_date > max_date)
    
    # Apply transformations
    transformed_df = source_df.groupBy('user_id').agg(
        count('*').alias('event_count'),
        max('event_date').alias('last_event_date')
    )
    
    return transformed_df

Configuration Options

Python models support the same configuration options as regular models, plus some Python-specific ones:

Option	Description	Example
`materialized`	Materialization type (`python_model` or `incremental`)	`python_model`
`file_format`	File format (must be `iceberg` for Python models)	`iceberg`
`incremental_strategy`	Strategy for incremental models	`merge`
`unique_key`	Unique key for merge operations	`['id']`
`partition_by`	Partition columns	`['date_column']`

Limitations

File format: Only Iceberg file format is supported for Python models
Glue version: Requires AWS Glue 4.0 or later for optimal support
Session configuration: Requires proper Iceberg Spark extensions configuration
Return type: Model function must return exactly one Spark DataFrame
Performance: Python models may have longer execution times compared to SQL models

Best Practices

Use appropriate file formats: Always use file_format='iceberg' for Python models
Optimize DataFrame operations: Use Spark DataFrame operations efficiently
Handle incremental logic: Use dbt.is_incremental() for conditional processing
Test thoroughly: Python models are experimental, so test extensively
Monitor performance: Python models may require more resources than SQL models

Example: Data Science Workflow

def model(dbt, spark):
    dbt.config(
        materialized='python_model',
        file_format='iceberg',
        partition_by=['analysis_date']
    )
    
    # Import required libraries (available in Glue environment)
    from pyspark.sql.functions import col, when, avg, stddev
    from datetime import datetime
    
    # Get source data
    sales_df = dbt.ref('sales_data')
    
    # Perform statistical analysis
    stats_df = sales_df.groupBy('product_category').agg(
        avg('sales_amount').alias('avg_sales'),
        stddev('sales_amount').alias('stddev_sales'),
        count('*').alias('transaction_count')
    )
    
    # Add analysis metadata
    result_df = stats_df.withColumn('analysis_date', lit(datetime.now().date()))
    
    return result_df

Incremental models

dbt seeks to offer useful and intuitive modeling abstractions by means of its built-in configurations and materializations.

For that reason, the dbt-glue plugin leans heavily on the incremental_strategy config. This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of three values:

append (default): Insert new records without updating or overwriting any existing data.
insert_overwrite: If partition_by is specified, overwrite partitions in the table with new data. If no partition_by is specified, overwrite the entire table with new data.
merge (Apache Hudi and Apache Iceberg only): Match records based on a unique_key; update old records, insert new ones. (If no unique_key is specified, all new data is inserted, similar to append.)

Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, incremental_strategy may be specified in dbt_project.yml or within a model file's config() block.

Notes: The default strategy is insert_overwrite

The `append` strategy

Following the append strategy, dbt will perform an insert into statement with all new data. The appeal of this strategy is that it is straightforward and functional across all platforms, file types, connection methods, and Apache Spark versions. However, this strategy cannot update, overwrite, or delete existing data, so it is likely to insert duplicate records for many data sources.

Source code

{{ config(
    materialized='incremental',
    incremental_strategy='append',
) }}

--  All rows returned by this query will be appended to the existing table

select * from {{ ref('events') }}
{% if is_incremental() %}
  where event_ts > (select max(event_ts) from {{ this }})
{% endif %}

Run Code

create temporary view spark_incremental__dbt_tmp as

    select * from analytics.events

    where event_ts >= (select max(event_ts) from {{ this }})

;

insert into table analytics.spark_incremental
    select `date_day`, `users` from spark_incremental__dbt_tmp

The `insert_overwrite` strategy

This strategy is most effective when specified alongside a partition_by clause in your model config. dbt will run an atomic insert overwrite statement that dynamically replaces all partitions included in your query. Be sure to re-select all of the relevant data for a partition when using this incremental strategy.

If no partition_by is specified, then the insert_overwrite strategy will atomically replace all contents of the table, overriding all existing data with only the new records. The column schema of the table remains the same, however. This can be desirable in some limited circumstances, since it minimizes downtime while the table contents are overwritten. The operation is comparable to running truncate + insert on other databases. For atomic replacement of Delta-formatted tables, use the table materialization (which runs create or replace) instead.

Source Code

{{ config(
    materialized='incremental',
    partition_by=['date_day'],
    file_format='parquet'
) }}

/*
  Every partition returned by this query will be overwritten
  when this model runs
*/

with new_events as (

    select * from {{ ref('events') }}

    {% if is_incremental() %}
    where date_day >= date_add(current_date, -1)
    {% endif %}

)

select
    date_day,
    count(*) as users

from events
group by 1

Run Code

create temporary view spark_incremental__dbt_tmp as

    with new_events as (

        select * from analytics.events


        where date_day >= date_add(current_date, -1)


    )

    select
        date_day,
        count(*) as users

    from events
    group by 1

;

insert overwrite table analytics.spark_incremental
    partition (date_day)
    select `date_day`, `users` from spark_incremental__dbt_tmp

Specifying insert_overwrite as the incremental strategy is optional, since it's the default strategy used when none is specified.

The `merge` strategy

Compatibility:

Hudi : OK
Delta Lake : OK
Iceberg : OK
Lake Formation Governed Tables : On going

NB:

For Glue 3: you have to setup a Glue connectors.
For Glue 4: use the datalake_formats option in your profile.yml

When using a connector be sure that your IAM role has these policies:

{
    "Sid": "access_to_connections",
    "Action": [
        "glue:GetConnection",
        "glue:GetConnections"
    ],
    "Resource": [
        "arn:aws:glue:<region>:<AWS Account>:catalog",
        "arn:aws:glue:<region>:<AWS Account>:connection/*"
    ],
    "Effect": "Allow"
}

and that the managed policy AmazonEC2ContainerRegistryReadOnly is attached. Be sure that you follow the getting started instructions here.

This blog post also explain how to setup and works with Glue Connectors

Hudi

Usage notes: The merge with Hudi incremental strategy requires:

To add file_format: hudi in your table configuration
To add a datalake_formats in your profile : datalake_formats: hudi
- Alternatively, to add a connections in your profile : connections: name_of_your_hudi_connector
To add Kryo serializer in your Interactive Session Config (in your profile): conf: spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false

dbt will run an atomic merge statement which looks nearly identical to the default merge behavior on Snowflake and BigQuery. If a unique_key is specified (recommended), dbt will update old records with values from new records that match on the key column. If a unique_key is not specified, dbt will forgo match criteria and simply insert all new records (similar to append strategy).

Profile config example

test_project:
  target: dev
  outputs:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      conf: spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.hive.convertMetastoreParquet=false
      datalake_formats: hudi

Source Code example

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='user_id',
    file_format='hudi',
    hudi_options={
        'hoodie.datasource.write.precombine.field': 'eventtime',
    }
) }}

with new_events as (

    select * from {{ ref('events') }}

    {% if is_incremental() %}
    where date_day >= date_add(current_date, -1)
    {% endif %}

)

select
    user_id,
    max(date_day) as last_seen

from events
group by 1

Delta

You can also use Delta Lake to be able to use merge feature on tables.

Usage notes: The merge with Delta incremental strategy requires:

To add file_format: delta in your table configuration
To add a datalake_formats in your profile : datalake_formats: delta
- Alternatively, to add a connections in your profile : connections: name_of_your_delta_connector
To add the following config in your Interactive Session Config (in your profile): conf: "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

Athena: Athena is not compatible by default with delta tables, but you can configure the adapter to create Athena tables on top of your delta table. To do so, you need to configure the two following options in your profile:

delta_athena_prefix: "the_prefix_of_your_choice"
If your table is partitioned, then the add of new partition is not automatic, you need to perform an MSCK REPAIR TABLE your_delta_table after each new partition adding

Profile config example

test_project:
  target: dev
  outputs:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      datalake_formats: delta
      conf: "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
      delta_athena_prefix: "delta"

Source Code example

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='user_id',
    partition_by=['dt'],
    file_format='delta'
) }}

with new_events as (

    select * from {{ ref('events') }}

    {% if is_incremental() %}
    where date_day >= date_add(current_date, -1)
    {% endif %}

)

select
    user_id,
    max(date_day) as last_seen,
    current_date() as dt

from events
group by 1

Iceberg

Usage notes: The merge with Iceberg incremental strategy requires:

To add file_format: Iceberg in your table configuration
To add a datalake_formats in your profile : datalake_formats: iceberg
- Alternatively, if you use Glue 3.0 or more, to add a connections in your profile : connections: name_of_your_iceberg_connector (
  - For Athena version 3:
    - The adapter is compatible with the Iceberg Connector from AWS Marketplace with Glue 3.0 as Fulfillment option and 0.14.0 (Oct 11, 2022) as Software version)
    - the latest connector for iceberg in AWS marketplace uses Ver 0.14.0 for Glue 3.0, and Ver 1.2.1 for Glue 4.0 where Kryo serialization fails when writing iceberg, use "org.apache.spark.serializer.JavaSerializer" for spark.serializer instead, more info here
  - For Athena version 2: The adapter is compatible with the Iceberg Connector from AWS Marketplace with Glue 3.0 as Fulfillment option and 0.12.0-2 (Feb 14, 2022) as Software version)
For Glue 4.0, to add the following configurations in dbt-profile:

    --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
    --conf spark.sql.catalog.glue_catalog.warehouse=s3://<PATH_TO_YOUR_WAREHOUSE>
    --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
    --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO 
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

For Glue 3.0, you need to set up more configurations :

    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
    --conf spark.sql.warehouse=s3://<your-bucket-name>
    --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog 
    --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
    --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO 
    --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.glue.DynamoDbLockManager
    --conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable  
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Also note that for Glue 4.0, you can choose between Glue Optimistic Locking (enabled by default) and DynamoDB Lock Manager for concurrent update to a table.
- If you want to activate DynamoDB Lock Manager set the below config in your profiles. A DynamoDB would be created on your behalf (if it does not exist).

    --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.dynamodb.DynamoDbLockManager
    --conf spark.sql.catalog.glue_catalog.lock.table=<DYNAMODB_TABLE_NAME>

You'll also need to grant the dbt-glue execution role with the appropriate permissions on DynamoDB

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CommitLockTable",
            "Effect": "Allow",
            "Action": [
                "dynamodb:CreateTable",
                "dynamodb:BatchGetItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:ConditionCheckItem",
                "dynamodb:PutItem",
                "dynamodb:DescribeTable",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:Scan",
                "dynamodb:Query",
                "dynamodb:UpdateItem"
            ],
            "Resource": "arn:aws:dynamodb:<AWS_REGION>:<AWS_ACCOUNT_ID>:table/<DYNAMODB_TABLE_NAME>"
        }
    ]
}

Note that if you use Glue 3.0 DynamoDB Lock Manager is the only option available and you need to set org.apache.iceberg.aws.glue.DynamoLockManager instead :

    --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.glue.DynamoDbLockManager
    --conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable

dbt will run an atomic merge statement which looks nearly identical to the default merge behavior on Snowflake and BigQuery. You need to provide a unique_key to perform merge operation otherwise it will fail. This key is to provide in a Python list format and can contains multiple column name to create a composite unique_key.

Notes

When using a custom_location in Iceberg, avoid to use final trailing slash. Adding a final trailing slash lead to an un-proper handling of the location, and issues when reading the data from query engines like Trino. The issue should be fixed for Iceberg version > 0.13. Related Github issue can be find here.
Iceberg also supports insert_overwrite and append strategies.
The warehouse conf must be provided, but it's overwritten by the adapter location in your profile or custom_location in model configuration.
By default, this materialization has iceberg_expire_snapshots set to 'True', if you need to have historical auditable changes, set: iceberg_expire_snapshots='False'.
The custom_iceberg_catalog_namespace parameter configures the namespace for Apache Iceberg catalog integration. This parameter enables the use of Iceberg tables within your Spark application by setting up the necessary catalog configurations. Default Value: glue_catalog When specifying a non-null and non-empty value for custom_iceberg_catalog_namespace, the following Spark configurations must be provided:

--conf spark.sql.catalog.{catalog_namespace}=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.{catalog_namespace}.warehouse={warehouse_path}
--conf spark.sql.catalog.{catalog_namespace}.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
--conf spark.sql.catalog.{catalog_namespace}.io-impl=org.apache.iceberg.aws.s3.S3FileIO

When using the default value, the following spark configuration should be added to enable iceberg.

--conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
--conf spark.sql.catalog.glue_catalog.warehouse=s3://your-warehouse-path
--conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
--conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO

A full reference to table_properties can be found here.
Iceberg Tables are natively supported by Athena. Therefore, you can query tables created and operated with dbt-glue adapter from Athena.
Incremental Materialization with Iceberg file format supports dbt snapshot. You are able to run a dbt snapshot command that queries an Iceberg Table and create a dbt fashioned snapshot of it.

Profile config example

test_project:
  target: dev
  outputs:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      datalake_formats: iceberg
      conf: --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.sql.warehouse=s3://aws-dbt-glue-datalake-1234567890-eu-west-1/dbt_test_project --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.dynamodb.DynamoDbLockManager --conf spark.sql.catalog.glue_catalog.lock.table=myGlueLockTable  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

Source Code example

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key=['user_id'],
    file_format='iceberg',
    iceberg_expire_snapshots='False', 
    partition_by=['status']
    table_properties={'write.target-file-size-bytes': '268435456'}
) }}

with new_events as (

    select * from {{ ref('events') }}

    {% if is_incremental() %}
    where date_day >= date_add(current_date, -1)
    {% endif %}

)

select
    user_id,
    max(date_day) as last_seen

from events
group by 1

Iceberg Snapshot source code example

{% snapshot demosnapshot %}

{{
    config(
        strategy='timestamp',
        target_schema='jaffle_db',
        updated_at='dt',
        file_format='iceberg'
) }}

select * from {{ ref('customers') }}

{% endsnapshot %}

Monitoring your Glue Interactive Session

Monitoring is an important part of maintaining the reliability, availability, and performance of AWS Glue and your other AWS solutions. AWS provides monitoring tools that you can use to watch AWS Glue, identify the required number of workers required for your Glue Interactive Session, report when something is wrong and take action automatically when appropriate. AWS Glue provides Spark UI, and CloudWatch logs and metrics for monitoring your AWS Glue jobs. More information on: Monitoring AWS Glue Spark jobs

Usage notes: Monitoring requires:

To add the following IAM policy to your IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudwatchMetrics",
            "Effect": "Allow",
            "Action": "cloudwatch:PutMetricData",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "cloudwatch:namespace": "Glue"
                }
            }
        },
        {
            "Sid": "CloudwatchLogs",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "logs:CreateLogStream",
                "logs:CreateLogGroup",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:/aws-glue/*",
                "arn:aws:s3:::bucket-to-write-sparkui-logs/*"
            ]
        }
    ]
}

To add monitoring parameters in your Interactive Session Config (in your profile). More information on Job parameters used by AWS Glue

Profile config example

test_project:
  target: dev
  outputs:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      default_arguments: "--enable-metrics=true, --enable-continuous-cloudwatch-log=true, --enable-continuous-log-filter=true, --enable-spark-ui=true, --spark-event-logs-path=s3://bucket-to-write-sparkui-logs/dbt/"

If you want to use the Spark UI, you can launch the Spark history server using a AWS CloudFormation template that hosts the server on an EC2 instance, or launch locally using Docker. More information on Launching the Spark history server

Enabling AWS Glue Auto Scaling

Auto Scaling is available since AWS Glue version 3.0 or later. More information on the following AWS blog post: "Introducing AWS Glue Auto Scaling: Automatically resize serverless computing resources for lower cost with optimized Apache Spark"

With Auto Scaling enabled, you will get the following benefits:

AWS Glue automatically adds and removes workers from the cluster depending on the parallelism at each stage or microbatch of the job run.
It removes the need for you to experiment and decide on the number of workers to assign for your AWS Glue Interactive sessions.
Once you choose the maximum number of workers, AWS Glue will choose the right size resources for the workload.
You can see how the size of the cluster changes during the Glue Interactive sessions run by looking at CloudWatch metrics. More information on Monitoring your Glue Interactive Session.

Usage notes: AWS Glue Auto Scaling requires:

To set your AWS Glue version 3.0 or later.
To set the maximum number of workers (if Auto Scaling is enabled, the workers parameter sets the maximum number of workers)
To set the --enable-auto-scaling=true parameter on your Glue Interactive Session Config (in your profile). More information on Job parameters used by AWS Glue

Profile config example

test_project:
  target: dev
  outputs:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      default_arguments: "--enable-auto-scaling=true"

Access Glue catalog in another AWS account

In many cases, you may need to run you dbt jobs to read from another AWS account.

Review the following link https://repost.aws/knowledge-center/glue-tables-cross-accounts to set up access policies in source and target accounts

Add the following "spark.hadoop.hive.metastore.glue.catalogid=" to your conf in the DBT profile, as such, you can have multiple outputs for each of the accounts that you have access to.

Note: The access cross-accounts need to be within the same AWS Region

Profile config example

test_project:
  target: dev
  outputsAccountB:
    dev:
      type: glue
      query-comment: my comment
      role_arn: arn:aws:iam::1234567890:role/GlueInteractiveSessionRole
      region: eu-west-1
      glue_version: "4.0"
      workers: 2
      worker_type: G.1X
      schema: "dbt_test_project"
      session_provisioning_timeout_in_seconds: 120
      location: "s3://aws-dbt-glue-datalake-1234567890-eu-west-1/"
      conf: "--conf hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory 
             --conf spark.hadoop.hive.metastore.glue.catalogid=<TARGET-AWS-ACCOUNT-ID-B>"

Persisting model descriptions

Relation-level docs persistence is supported since dbt v0.17.0. For more information on configuring docs persistence, see the docs.

When the persist_docs option is configured appropriately, you'll be able to see model descriptions in the Comment field of describe [table] extended or show table extended in [database] like '*'.

Always `schema`, never `database`

Apache Spark uses the terms "schema" and "database" interchangeably. dbt understands database to exist at a higher level than schema. As such, you should never use or set database as a node config or in the target profile when running dbt-glue.

If you want to control the schema/database in which dbt will materialize models, use the schema config and generate_schema_name macro only. For more information, check the dbt documentation about custom schemas.

AWS Lakeformation integration

The adapter supports AWS Lake Formation tags management enabling you to associate existing tags defined out of dbt-glue to database objects built by dbt-glue (database, table, view, snapshot, incremental models, seeds).

You can enable or disable lf-tags management via config, at model and dbt-project level (disabled by default)
If enabled, lf-tags will be updated on every dbt run. There are table level lf-tags configs and column-level lf-tags configs.
You can specify that you want to drop existing database, table column Lake Formation tags by setting the drop_existing config field to True (False by default, meaning existing tags are kept)
Please note that if the tag you want to associate with the table does not exist, the dbt-glue execution will throw an error

The adapter also supports AWS Lakeformation data cell filtering.

You can enable or disable data-cell filtering via config, at model and dbt-project level (disabled by default)
If enabled, data_cell_filters will be updated on every dbt run.
You can specify that you want to drop existing table data-cell filters by setting the drop_existing config field to True (False by default, meaning existing filters are kept)
You can leverage excluded_columns_names OR columns config fields to perform Column level security as well. Please note that you can use one or the other but not both.
By default, if you don't specify any column or excluded_columns, dbt-glue does not perform Column level filtering and let the principal access all the columns.

The below configuration let the specified principal (lf-data-scientist IAM user) access rows that have a customer_lifetime_value > 15 and all the columns specified ('customer_id', 'first_order', 'most_recent_order', 'number_of_orders')

lf_grants={
        'data_cell_filters': {
            'enabled': True,
            'drop_existing' : True,
            'filters': {
                'the_name_of_my_filter': {
                    'row_filter': 'customer_lifetime_value>15',
                    'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 
                    'column_names': ['customer_id', 'first_order', 'most_recent_order', 'number_of_orders']
                }
            }, 
        }
    }

The below configuration let the specified principal (lf-data-scientist IAM user) access rows that have a customer_lifetime_value > 15 and all the columns except the one specified ('first_name')

lf_grants={
        'data_cell_filters': {
            'enabled': True,
            'drop_existing' : True,
            'filters': {
                'the_name_of_my_filter': {
                    'row_filter': 'customer_lifetime_value>15',
                    'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 
                    'excluded_column_names': ['first_name']
                }
            }, 
        }
    }

See below some examples of how you can integrate LF Tags management and data cell filtering to your configurations :

At model level

This way of defining your Lakeformation rules is appropriate if you want to handle the tagging and filtering policy at object level. Remember that it overrides any configuration defined at dbt-project level.

{{ config(
    materialized='incremental',
    unique_key="customer_id",
    incremental_strategy='append',
    lf_tags_config={
          'enabled': true,
          'drop_existing' : False,
          'tags_database': 
          {
            'name_of_my_db_tag': 'value_of_my_db_tag'          
            }, 
          'tags_table': 
          {
            'name_of_my_table_tag': 'value_of_my_table_tag'          
            }, 
          'tags_columns': {
            'name_of_my_lf_tag': {
              'value_of_my_tag': ['customer_id', 'customer_lifetime_value', 'dt']
            }}},
    lf_grants={
        'data_cell_filters': {
            'enabled': True,
            'drop_existing' : True,
            'filters': {
                'the_name_of_my_filter': {
                    'row_filter': 'customer_lifetime_value>15',
                    'principals': ['arn:aws:iam::123456789:user/lf-data-scientist'], 
                    'excluded_column_names': ['first_name']
                }
            }, 
        }
    }
) }}

    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order,
        customer_orders.most_recent_order,
        customer_orders.number_of_orders,
        customer_payments.total_amount as customer_lifetime_value,
        current_date() as dt
        
    from customers

    left join customer_orders using (customer_id)

    left join customer_payments using (customer_id)

At dbt-project level

This way you can specify tags and data filtering policy for a particular path in your dbt project (eg. models, seeds, models/model_group1, etc.) This is especially useful for seeds, for which you can't define configuration in the file directly.

seeds:
  +lf_tags_config:
    enabled: true
    tags_table: 
      name_of_my_table_tag: 'value_of_my_table_tag'  
    tags_database: 
      name_of_my_database_tag: 'value_of_my_database_tag'
models:
  +lf_tags_config:
    enabled: true
    drop_existing: True
    tags_database: 
      name_of_my_database_tag: 'value_of_my_database_tag'
    tags_table: 
      name_of_my_table_tag: 'value_of_my_table_tag'

Building the Package

Use the build script for creating distributions:

# Default: run validation tests + build both sdist and wheel
$ ./scripts/build.py

# Fast build without tests (for CI/production)
$ ./scripts/build.py --skip-tests

# Build specific formats
$ ./scripts/build.py --build-type wheel
$ ./scripts/build.py --build-type sdist

The build script automatically:

Cleans previous build artifacts
Runs duplicate macro validation tests (unless --skip-tests)
Creates source and/or wheel distributions

Tests

To perform a functional test:

Install dev requirements:

$ pip3 install -r dev-requirements.txt

Install dev locally

$ python3 setup.py build && python3 setup.py install_lib

Export variables

$ export DBT_AWS_ACCOUNT=123456789101
$ export DBT_GLUE_REGION=us-east-1
$ export DBT_S3_LOCATION=s3://mybucket/myprefix
$ export DBT_GLUE_ROLE_ARN=arn:aws:iam::1234567890:role/GlueInteractiveSessionRole

Caution: Be careful not to set S3 path containing important files. dbt-glue's test suite automatically deletes all the existing files under the S3 path specified in DBT_S3_LOCATION.

Run the test

$ python3 -m pytest tests/functional

$ python3 -m pytest -s

Testing S3 Tables (Experimental)

Amazon S3 Tables support is currently being tested. To run S3 tables-specific tests:

Set additional environment variables:

$ export DBT_S3_TABLES_BUCKET=123456789012:s3tablescatalog/my-s3-tables-bucket

Run S3 tables tests:

$ python3 -m pytest tests/functional/adapter/s3_tables/ -v

Or using tox:

$ tox -e s3-tables

The S3 tables tests help us understand:

Which dbt features work with S3 tables out-of-the-box
What configurations are required for S3 tables
Which features need adapter modifications

For more information, check the dbt documentation about testing a new adapter.

Caveats

Supported Functionality

Most dbt Core functionality is supported, but some features are only available with Apache Hudi or specific configurations.

Apache Hudi-only features:

Incremental model updates by unique_key instead of partition_by (see merge strategy)

Experimental features:

Python models (requires Iceberg file format and AWS Glue 4.0+)
Amazon S3 Tables (currently in testing phase - see S3 tables tests for current status)

Some dbt features, available on the core adapters, are not yet supported on Glue:

Persisting column-level descriptions as database comments
Snapshots

For more information on dbt:

Read the introduction to dbt.
Read the dbt viewpoint.
Join the dbt community.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Project details

These details have been verified by PyPI

Maintainers

mehdimld menuetb moomindani sugichy yotahk

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.10.19

Feb 18, 2026

1.10.15

Nov 26, 2025

1.10.13

Oct 15, 2025

1.10.11

Sep 19, 2025

1.10.9

Aug 20, 2025

1.9.4

Apr 14, 2025

1.9.2

Mar 24, 2025

1.9.0

Dec 12, 2024

1.8.6

Sep 26, 2024

1.8.1

Jun 5, 2024

1.7.2

Feb 9, 2024

1.7.1

Dec 14, 2023

1.7.0

Nov 7, 2023

1.6.6

Oct 25, 2023

1.6.5

Sep 29, 2023

1.6.4

Sep 29, 2023

1.6.3

Sep 28, 2023

1.6.2

Aug 25, 2023

1.6.1

Aug 22, 2023

1.6.0

Aug 1, 2023

1.5.3

Jul 17, 2023

1.5.2

Jul 5, 2023

1.5.1

Jul 3, 2023

1.5.0

Jun 29, 2023

1.4.23

Jun 19, 2023

1.4.22

Jun 19, 2023

1.4.21

Apr 27, 2023

1.4.1

Apr 7, 2023

1.4.0

Apr 7, 2023

1.3.12

Jan 18, 2023

1.3.11

Jan 17, 2023

1.3.10

Jan 17, 2023

1.3.8

Dec 29, 2022

0.3.7

Dec 8, 2022

0.3.6

Nov 24, 2022

0.3.5

Nov 24, 2022

0.3.4

Nov 4, 2022

0.3.3

Nov 3, 2022

0.3.0

Oct 21, 2022

0.2.14

Oct 17, 2022

0.2.13

Oct 14, 2022

0.2.12

Oct 12, 2022

0.2.11

Oct 10, 2022

0.2.10

Sep 27, 2022

0.2.9

Sep 27, 2022

0.2.8

Sep 27, 2022

0.2.7

Sep 27, 2022

0.2.6

Sep 9, 2022

0.2.3

Jul 12, 2022

0.2.2

Jul 11, 2022

0.2.1

Jul 11, 2022

0.2.0

May 12, 2022

0.1.4

Apr 28, 2022

0.1.3

Apr 28, 2022

0.1.2

Apr 27, 2022

0.1.1

Apr 27, 2022

0.1.0

Apr 21, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_glue-1.10.19.tar.gz (93.9 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dbt_glue-1.10.19-py3-none-any.whl (75.4 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file dbt_glue-1.10.19.tar.gz.

File metadata

Download URL: dbt_glue-1.10.19.tar.gz
Upload date: Feb 18, 2026
Size: 93.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dbt_glue-1.10.19.tar.gz
Algorithm	Hash digest
SHA256	`d5a7cad7a2298ac3098d30ca6f8e330d4696494d59470662b4cb5fc2b233376b`
MD5	`0f6730a78f86e8aa9f37149f49d72395`
BLAKE2b-256	`cca3e7b128e2303c610b8ba5716c356caa64eb858f91007e366641a979003a14`

See more details on using hashes here.

File details

Details for the file dbt_glue-1.10.19-py3-none-any.whl.

File metadata

Download URL: dbt_glue-1.10.19-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 75.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for dbt_glue-1.10.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b4a6cb42abf4536566980569d6f173ac5215701d317298a9fde6c21599c09841`
MD5	`9b09a1831855db6edcc2d75f0c82c20d`
BLAKE2b-256	`c86e57635a852836d4078e5aea1022ae71266b48ba1ce2e9ef0e74908820be9b`

See more details on using hashes here.

dbt-glue 1.10.19

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dbt-glue

Installation

Connection Methods

Configuring your AWS profile for Glue Interactive Session

Configuration of the local environment

Example config

Configs

Configuring tables

Amazon S3 Tables Support (Experimental)

Configuration

Profile Configuration

Example Model

Requirements

Python models (Experimental)

Requirements

Configuration

Basic Usage

Referencing Other Models

Incremental Python Models

Configuration Options

Limitations

Best Practices

Example: Data Science Workflow

Incremental models

The append strategy

Source code

Run Code

The insert_overwrite strategy

Source Code

Run Code

The merge strategy

Hudi

Profile config example

Source Code example

Delta

Profile config example

Source Code example

Iceberg

Notes

Profile config example

Source Code example

Iceberg Snapshot source code example

Monitoring your Glue Interactive Session

Profile config example

Enabling AWS Glue Auto Scaling

Profile config example

Access Glue catalog in another AWS account

Profile config example

Persisting model descriptions

Always schema, never database

AWS Lakeformation integration

At model level

At dbt-project level

Building the Package

Tests

Testing S3 Tables (Experimental)

Caveats

Supported Functionality

Security

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

The `append` strategy

The `insert_overwrite` strategy

The `merge` strategy

Always `schema`, never `database`