Skip to main content

Athena adapter for dbt platform

Project description

dbt-athena2

This is a adapter leveraged from this repo to better serve our custom needs. It supports addtional capabilitis as below:

  • Run on dbt-core version 1.0.x
  • Support boto3 session to take the credential from from aws profile name
  • On schema change support for new columns added
  • Add s3 bucket for storing data instead of randomly writing on staging dir

Quick started

Within your python environment, proceed below step to initate a first project. There will be some prompts at during inital steps, refer Configuring your profile section below to properly set it up.

pip install dbt-athena2
dbt init my_dbt_project
export DBT_PROFILES_DIR=`pwd`
cd my_dbt_project
dbt debug # to test connection
dbt seed # to dump your seed data
dbt run # to run all models
dbt run --select model_name # to run specific model

#...and more...

Basic usage

Model configuration

Below show an example how we configure how our model be configured.

  • There are 4 supported materialized modes: view, table, incremental and esphemeral. Details here.
  • incremental_strategy supports insert_overwrite and append. If partition is specified, it only overwrites partions available from source data.
  • on_schema_change support fail, ignore and append_new_columns only and for only incremental materialization. Details here.
  • There are some usefule macro such as run_started_at can be referred from here to enhance the flexibility on the model.
{{ config(
    materialized="incremental",
    partitioned_by=["report_month"],
    incremental_strategy="insert_overwrite",
    on_schema_change="append_new_columns"
) }}

{% set run_date = run_started_at.astimezone(modules.pytz.timezone("Asia/Saigon")).strftime("%Y-%m-%d") %}

select cast(working_day as timestamp) working_day,
sum(spend_balance) spend_balance,
sum(td_balance) td_balance,
sum(gs_balance) gs_balance,
cast(date_format(date_parse('{{ run_date }}', '%Y-%m-%d') - interval '1' month, '%Y%m') as int) report_month
from {{ source('analytics', 'eod_balance') }}
where cast(working_day as date) >= date_trunc('month', cast(date_parse('{{ run_date }}', '%Y-%m-%d')  as date)-interval'2'month)
and cast(working_day as date) < date_trunc('month', cast(date_parse('{{ run_date }}', '%Y-%m-%d')  as date)-interval'1'month)
group by working_day
order by working_day desc

Seed

Under folder seeds, place csv seed file ( eg. c_ecom_rate.csv) and the yaml config (eg. c_ecom_rate.yml) as below example. Then run dbt seed

version: 2

seeds:
  - name: c_ecom_rate
    config:
      enabled: true
      quote_columns: true
      tags: accounting | report

Further notes

  • If the workgroup is specified in the profile.yml without s3_staging_dir, it will extract the default s3 ouput attached with that work_group when Override client-side settings enabled.

  • The boto3 session inherit from devlopment environment; once deployed, it should be obtained permission as role such as EC2 profile instance or K8S service account role.

  • Athena limit ALTER ADD COLUMNS with data type date, recommend to parse it to timestamp or string during implementing the model. Details here.

  • Athena not accept the comment like /*, to ignore these auto generated comment from dbt, place this query-comment: null in dbt_project.yml file.

Configuring your profile

A dbt profile can be configured to run against AWS Athena using the following configuration:

Option Description Required? Example
s3_staging_dir S3 location to store Athena query results and metadata Required s3://athena-output-bucket/data_services/
region_name AWS region of your Athena instance Required ap-southeast-1
schema Specify the schema (Athena database) to build models into (lowercase only) Required dbt
database Specify the database (Data catalog) to build models into (lowercase only) Required awsdatacatalog
poll_interval Interval in seconds to use for polling the status of query results in Athena Optional 5
aws_profile_name Profile to use from your AWS shared credentials file. Optional my-profile
work_group Identifier of Athena workgroup Optional my-custom-workgroup
num_retries Number of times to retry a failing query Optional 3
s3_data_dir Prefix for storing tables, if different from the connection's s3_staging_dir Optional s3://athena-data-bucket/{schema_name}/{table_name}/

Example profiles.yml entry:

athena:
  target: dev
  outputs:
    dev:
      database: awsdatacatalog
      region_name: ap-southeast-1
      aws_profile_name: dl-dev-process
      s3_staging_dir: s3://athena-output-bucket/data_services/
      s3_data_dir: s3://athena-data-bucket/{schema_name}/{table_name}/
      schema: accounting
      type: athena

Additional information

  • threads is supported
  • database and catalog can be used interchangeably

Running tests

First, install the adapter and its dependencies using make (see Makefile):

make install_deps

Next, configure the environment variables in dev.env to match your Athena development environment. Finally, run the tests using make:

make run_tests

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-athena2-1.1.8.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_athena2-1.1.8-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file dbt-athena2-1.1.8.tar.gz.

File metadata

  • Download URL: dbt-athena2-1.1.8.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for dbt-athena2-1.1.8.tar.gz
Algorithm Hash digest
SHA256 e14a6ac5e687bbcafe4f8b2a1be5420f4548b01e9a0e909248bd3373cc18e5b2
MD5 8ac91978028ee993f52027f0065a0849
BLAKE2b-256 f96a889d30d0d68f9f410ed187a975fca7eafda45a1285647f0d64d46f79ed78

See more details on using hashes here.

File details

Details for the file dbt_athena2-1.1.8-py3-none-any.whl.

File metadata

  • Download URL: dbt_athena2-1.1.8-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for dbt_athena2-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d12e1f78b7a487967aa748e8c39c24b77923418356fe12272e4b0dcb0964be5b
MD5 472d75059582e7c543a66f355d31945a
BLAKE2b-256 0e7fe7a67fd3cc8207a705ff8aa206c4cce241ea82b031392fd298599a6de603

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page