The athena adapter plugin for dbt (data build tool)
Project description
dbt-athena
- Supports dbt version
1.3.* - Supports Seeds
- Correctly detects views and their columns
- Support incremental models
- On iceberg tables :
- Support the use of
unique_keyonly with query enginev3using themergestrategy - Support the
appendstrategy
- Support the use of
- On Hive tables :
- Support two incremental update strategies:
insert_overwriteandappend - Does not support the use of
unique_key
- Support two incremental update strategies:
- On iceberg tables :
- Does not support Python models
Installation
pip install dbt-athena-community- Or
pip install git+https://github.com/dbt-athena/dbt-athena.git
Prerequisites
To start, you will need an S3 bucket, for instance my-staging-bucket and an Athena database:
CREATE DATABASE IF NOT EXISTS analytics_dev
COMMENT 'Analytics models generated by dbt (development)'
LOCATION 's3://my-staging-bucket/'
WITH DBPROPERTIES ('creator'='Foo Bar', 'email'='foo@bar.com');
Notes:
- Take note of your AWS region code (e.g.
us-west-2oreu-west-2, etc.). - You can also use AWS Glue to create and manage Athena databases.
Credentials
This plugin does not accept any credentials directly. Instead, credentials are determined automatically based on aws cli/boto3 conventions and
stored login info. You can configure the AWS profile name to use via aws_profile_name. Checkout DBT profile configuration below for details.
Configuring your profile
A dbt profile can be configured to run against AWS Athena using the following configuration:
| Option | Description | Required? | Example |
|---|---|---|---|
| s3_staging_dir | S3 location to store Athena query results and metadata | Required | s3://bucket/dbt/ |
| s3_data_dir | Prefix for storing tables, if different from the connection's s3_staging_dir |
Optional | s3://bucket2/dbt/ |
| s3_data_naming | How to generate table paths in s3_data_dir |
Optional | schema_table_unique |
| region_name | AWS region of your Athena instance | Required | eu-west-1 |
| schema | Specify the schema (Athena database) to build models into (lowercase only) | Required | dbt |
| database | Specify the database (Data catalog) to build models into (lowercase only) | Required | awsdatacatalog |
| poll_interval | Interval in seconds to use for polling the status of query results in Athena | Optional | 5 |
| aws_profile_name | Profile to use from your AWS shared credentials file. | Optional | my-profile |
| work_group | Identifier of Athena workgroup | Optional | my-custom-workgroup |
| num_retries | Number of times to retry a failing query | Optional | 3 |
Example profiles.yml entry:
athena:
target: dev
outputs:
dev:
type: athena
s3_staging_dir: s3://athena-query-results/dbt/
s3_data_dir: s3://your_s3_bucket/dbt/
s3_data_naming: schema_table
region_name: eu-west-1
schema: dbt
database: awsdatacatalog
aws_profile_name: my-profile
work_group: my-workgroup
Additional information
threadsis supporteddatabaseandcatalogcan be used interchangeably
Usage notes
Models
Table Configuration
external_location(default=none)- If set, the full S3 path in which the table will be saved.
partitioned_by(default=none)- An array list of columns by which the table will be partitioned
- Limited to creation of 100 partitions (currently)
bucketed_by(default=none)- An array list of columns to bucket data
bucket_count(default=none)- The number of buckets for bucketing your data
format(default='parquet')- The data format for the table
- Supports
ORC,PARQUET,AVRO,JSON, orTEXTFILE
write_compression(default=none)- The compression type to use for any storage format that allows compression to be specified. To see which options are available, check out [CREATE TABLE AS][create-table-as]
field_delimiter(default=none)- Custom field delimiter, for when format is set to
TEXTFILE
- Custom field delimiter, for when format is set to
table_properties: table properties to add to the table, valid for Iceberg onlystrict_location(default=True): when working with iceberg it's possible to rename tables, in order to do so, tables need to avoid to have same location. Setting upstrict_locationto false allow a table creation on an unique location
Table location
The location in which a table is saved is determined by:
- If
external_locationis defined, that value is used. - If
s3_data_diris defined, the path is determined by that ands3_data_naming - If
s3_data_diris not defined data is stored unders3_staging_dir/tables/
Here all the options available for s3_data_naming:
uuid:{s3_data_dir}/{uuid4()}/table_table:{s3_data_dir}/{table}/table_unique:{s3_data_dir}/{table}/{uuid4()}/schema_table:{s3_data_dir}/{schema}/{table}/s3_data_naming=schema_table_unique:{s3_data_dir}/{schema}/{table}/{uuid4()}/
It's possible to set the s3_data_naming globally in the target profile, or overwrite the value in the table config,
or setting up the value for groups of model in dbt_project.yml
Incremental models
Support for incremental models.
These strategies are supported:
insert_overwriteappend
Note:
unique_keyis not supported.
on_schema_change is an option to reflect changes of schema in incremental models.
The following options are supported:
ignore(default)failappend_new_columnssync_all_columns
In detail, please refer to dbt docs.
Iceberg
The adapter support table materialization for Iceberg.
To get started just add this as your model:
{{ config(
materialized='table',
format='iceberg',
partitioned_by=['bucket(5, user_id)'],
table_properties={
'optimize_rewrite_delete_file_threshold': '2'
}
) }}
SELECT
'A' AS user_id,
'pi' AS name,
'active' AS status,
17.89 AS cost,
1 AS quantity,
100000000 AS quantity_big,
current_date AS my_date
Iceberg support bucketing as hidden partitions, therefore use the partitioned_by config to add specific bucketing conditions.
Unsupported functionality
Due to the nature of AWS Athena, not all core dbt functionality is supported. The following features of dbt are not implemented on Athena:
- Snapshots
Known issues
-
Quoting is not currently supported
- If you need to quote your sources, escape the quote characters in your source definitions:
version: 2 sources: - name: my_source tables: - name: first_table identifier: "first table" # Not like that - name: second_table identifier: "\"second table\"" # Like this
-
Tables, schemas and database should only be lowercase
Contributing
This connector works with Python from 3.7 to 3.10.
Getting started
In order to start developing on this adapter clone the repo and run this make command (see Makefile) :
make setup
It will :
- Install all dependencies.
- Install pre-commit hooks.
- Generate your
.envfile
Next, adjust .env file by configuring the environment variables to match your Athena development environment.
Running tests
You must have an AWS account with Athena setup in order to launch the tests. You can run the tests using make:
make run_tests
Helpful Resources
Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbt-athena-community-1.3.2.tar.gz.
File metadata
- Download URL: dbt-athena-community-1.3.2.tar.gz
- Upload date:
- Size: 28.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05cd26e8e0f9686d69e84462ee82dfb34990e0e1f03e12ee03e0738e17114766
|
|
| MD5 |
6b3ea682456e000c79029cc675c305f6
|
|
| BLAKE2b-256 |
3665b3e5be8ca7588a90da7865910729666e0510b5dffee29946a9d4c9329efb
|
File details
Details for the file dbt_athena_community-1.3.2-py3-none-any.whl.
File metadata
- Download URL: dbt_athena_community-1.3.2-py3-none-any.whl
- Upload date:
- Size: 36.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e819de9a17e2828e62d7829e65036deab2d88647b5181ae98917e65b33b2cc5
|
|
| MD5 |
7d9540c55cfb4e7efc32841e72079ed1
|
|
| BLAKE2b-256 |
3f4482c153d791b89cfc4b81d01a8e39cf8a5370dc5163f7ff3e80f678f22400
|