Athena adapter for dbt platform
Project description
dbt-athena2
This is a adapter leveraged from this repo to better serve our custom needs. It supports addtional capabilities as below:
- Run on dbt-core version 1.1.x
- Support boto3 session to take the credential from from aws profile name
- On schema change support for new columns added
- Add s3 bucket for storing data instead of randomly writing on staging dir
Quick started
Within your python environment, proceed below step to initate a first project. There will be some prompts at during inital steps, refer Configuring your profile
section below to properly set it up.
pip install dbt-athena2
dbt init my_dbt_project
export DBT_PROFILES_DIR=`pwd`
cd my_dbt_project
dbt debug # to test connection
dbt seed # to dump your seed data
dbt run # to run all models
dbt run --select model_name # to run specific model
#...and more...
Basic usage
Model configuration
Below show an example how we configure how our model be configured.
- There are 4 supported
materialized
modes:view
,table
,incremental
andesphemeral
. Details here. incremental_strategy
supportsinsert_overwrite
andappend
. If partition is specified, it only overwrites partions available from source data.on_schema_change
supportfail
,ignore
andappend_new_columns
only and for onlyincremental
materialization. Details here.- There are some usefule macro such as
run_started_at
can be referred from here to enhance the flexibility on the model.
{{ config(
materialized="incremental",
partitioned_by=["report_month"],
incremental_strategy="insert_overwrite",
on_schema_change="append_new_columns"
) }}
{% set run_date = run_started_at.astimezone(modules.pytz.timezone("Asia/Saigon")).strftime("%Y-%m-%d") %}
select cast(working_day as timestamp) working_day,
sum(spend_balance) spend_balance,
sum(td_balance) td_balance,
sum(gs_balance) gs_balance,
cast(date_format(date_parse('{{ run_date }}', '%Y-%m-%d') - interval '1' month, '%Y%m') as int) report_month
from {{ source('analytics', 'eod_balance') }}
where cast(working_day as date) >= date_trunc('month', cast(date_parse('{{ run_date }}', '%Y-%m-%d') as date)-interval'2'month)
and cast(working_day as date) < date_trunc('month', cast(date_parse('{{ run_date }}', '%Y-%m-%d') as date)-interval'1'month)
group by working_day
order by working_day desc
Seed
Under folder seeds, place csv seed file ( eg. c_ecom_rate.csv
) and the yaml config (eg. c_ecom_rate.yml
) as below example. Then run dbt seed
version: 2
seeds:
- name: c_ecom_rate
config:
enabled: true
quote_columns: true
tags: accounting | report
Further notes
-
If the workgroup is specified in the
profile.yml
withouts3_staging_dir
, it will extract the default s3 ouput attached with thatwork_group when Override client-side settings enabled
. -
The boto3 session inherit from devlopment environment; once deployed, it should be obtained permission as role such as EC2 profile instance or K8S service account role.
-
Athena limit ALTER ADD COLUMNS with data type
date
, recommend to parse it totimestamp
orstring
during implementing the model. Details here. -
Athena not accept the comment like
/*
, to ignore these auto generated comment fromdbt
, place thisquery-comment: null
indbt_project.yml
file.
Configuring your profile
A dbt profile can be configured to run against AWS Athena using the following configuration:
Option | Description | Required? | Example |
---|---|---|---|
s3_staging_dir | S3 location to store Athena query results and metadata | Required | s3://athena-output-bucket/data_services/ |
region_name | AWS region of your Athena instance | Required | ap-southeast-1 |
schema | Specify the schema (Athena database) to build models into (lowercase only) | Required | dbt |
database | Specify the database (Data catalog) to build models into (lowercase only) | Required | awsdatacatalog |
poll_interval | Interval in seconds to use for polling the status of query results in Athena | Optional | 5 |
aws_profile_name | Profile to use from your AWS shared credentials file. | Optional | my-profile |
work_group | Identifier of Athena workgroup | Optional | my-custom-workgroup |
num_retries | Number of times to retry a failing query | Optional | 3 |
s3_data_dir | Prefix for storing tables, if different from the connection's s3_staging_dir |
Optional | s3://athena-data-bucket/{schema_name}/{table_name}/ |
Example profiles.yml entry:
athena:
target: dev
outputs:
dev:
database: awsdatacatalog
region_name: ap-southeast-1
aws_profile_name: dl-dev-process
s3_staging_dir: s3://athena-output-bucket/data_services/
s3_data_dir: s3://athena-data-bucket/{schema_name}/{table_name}/
schema: accounting
type: athena
Additional information
threads
is supporteddatabase
andcatalog
can be used interchangeably
Running tests
First, install the adapter and its dependencies using make
(see Makefile):
make install_deps
Next, configure the environment variables in dev.env to match your Athena development environment. Finally, run the tests using make
:
make run_tests
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dbt_athena2-1.3.0.tar.gz
.
File metadata
- Download URL: dbt_athena2-1.3.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d076a6d2f814fce3a7352bfacfbbe1f6aedab4a657e645bd22d83d099fefe5c5 |
|
MD5 | 46d359eadb34acaa42e546b4c97ceb0a |
|
BLAKE2b-256 | a268327f2dce80ab748a1c6d140e14c2770bad39c5a6c8d662e42c7cb7dcbdb9 |
File details
Details for the file dbt_athena2-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: dbt_athena2-1.3.0-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/37.3 requests/2.28.2 requests-toolbelt/0.10.1 urllib3/1.26.15 tqdm/4.65.0 importlib-metadata/6.0.0 keyring/23.13.1 rfc3986/2.0.0 colorama/0.4.6 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eea7f84f179223a05554be4a76842af452a09c175b4f4aac0c33230db574c5c2 |
|
MD5 | 9c930fd4a6bc3b9ec3b3f3733442088e |
|
BLAKE2b-256 | f4e3a906185910994f7321752336347ca69e6f12e5be4b37d96e728566aa4122 |