Skip to main content

The athena adapter plugin for dbt (data build tool)

Project description

dbt-athena

Installation

  • pip install dbt-athena-community
  • Or pip install git+https://github.com/dbt-athena/dbt-athena.git

Prerequisites

To start, you will need an S3 bucket, for instance my-staging-bucket and an Athena database:

CREATE DATABASE IF NOT EXISTS analytics_dev
COMMENT 'Analytics models generated by dbt (development)'
LOCATION 's3://my-staging-bucket/'
WITH DBPROPERTIES ('creator'='Foo Bar', 'email'='foo@bar.com');

Notes:

  • Take note of your AWS region code (e.g. us-west-2 or eu-west-2, etc.).
  • You can also use AWS Glue to create and manage Athena databases.

Credentials

This plugin does not accept any credentials directly. Instead, credentials are determined automatically based on aws cli/boto3 conventions and stored login info. You can configure the AWS profile name to use via aws_profile_name. Checkout DBT profile configuration below for details.

Configuring your profile

A dbt profile can be configured to run against AWS Athena using the following configuration:

Option Description Required? Example
s3_staging_dir S3 location to store Athena query results and metadata Required s3://bucket/dbt/
region_name AWS region of your Athena instance Required eu-west-1
schema Specify the schema (Athena database) to build models into (lowercase only) Required dbt
database Specify the database (Data catalog) to build models into (lowercase only) Required awsdatacatalog
poll_interval Interval in seconds to use for polling the status of query results in Athena Optional 5
aws_profile_name Profile to use from your AWS shared credentials file. Optional my-profile
work_group Identifier of Athena workgroup Optional my-custom-workgroup
num_retries Number of times to retry a failing query Optional 3

Example profiles.yml entry:

athena:
  target: dev
  outputs:
    dev:
      type: athena
      s3_staging_dir: s3://athena-query-results/dbt/
      region_name: eu-west-1
      schema: dbt
      database: awsdatacatalog
      aws_profile_name: my-profile
      work_group: my-workgroup

Additional information

  • threads is supported
  • database and catalog can be used interchangeably

Usage notes

Models

Table Configuration

  • external_location (default=none)
    • The location where Athena saves your table in Amazon S3
    • If none then it will default to {s3_staging_dir}/tables
    • If you are using a static value, when your table/partition is recreated underlying data will be cleaned up and overwritten by new data
  • partitioned_by (default=none)
    • An array list of columns by which the table will be partitioned
    • Limited to creation of 100 partitions (currently)
  • bucketed_by (default=none)
    • An array list of columns to bucket data
  • bucket_count (default=none)
    • The number of buckets for bucketing your data
  • format (default='parquet')
    • The data format for the table
    • Supports ORC, PARQUET, AVRO, JSON, or TEXTFILE
  • write_compression (default=none)
    • The compression type to use for any storage format that allows compression to be specified. To see which options are available, check out CREATE TABLE AS
  • field_delimiter (default=none)
    • Custom field delimiter, for when format is set to TEXTFILE

More information: CREATE TABLE AS

Supported functionality

Support for incremental models:

  • Support two incremental update strategies with partitioned tables: insert_overwrite and append
  • Does not support the use of unique_key

Due to the nature of AWS Athena, not all core dbt functionality is supported. The following features of dbt are not implemented on Athena:

  • Snapshots

Known issues

  • Quoting is not currently supported

    • If you need to quote your sources, escape the quote characters in your source definitions:
    version: 2
    
    sources:
      - name: my_source
        tables:
          - name: first_table
            identifier: "first table"       # Not like that
          - name: second_table
            identifier: "\"second table\""  # Like this
    
  • Tables, schemas and database should only be lowercase

  • Only supports Athena engine 2

Running tests

First, install the adapter and its dependencies using make (see Makefile):

make install_deps

Next, configure the environment variables in dev.env to match your Athena development environment. Finally, run the tests using make:

make run_tests

Community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt-athena-community-1.3.0.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

dbt_athena_community-1.3.0-py3-none-any.whl (25.7 kB view details)

Uploaded Python 3

File details

Details for the file dbt-athena-community-1.3.0.tar.gz.

File metadata

  • Download URL: dbt-athena-community-1.3.0.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for dbt-athena-community-1.3.0.tar.gz
Algorithm Hash digest
SHA256 aae1bc3fcba97d5c9d32706dc1f1deece91a22247adde4db11b74127ac37685a
MD5 cbc259547e4ee66fb196caba02fce86c
BLAKE2b-256 d5909671485b1998e36d66822670b6da47b6de0244b4cbdd29c405dbece75f5f

See more details on using hashes here.

File details

Details for the file dbt_athena_community-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_athena_community-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2bcf7cdb51d8f201a0046c451aa4d6d209a0aea95ff6ee69358d4aca78dcf88
MD5 2a8089391ea91ecba9e0e9f17bca6bf8
BLAKE2b-256 c2ba388e6401bf2e67a6c5ff0fe5d5779c6046e8323d7b452bb47850b59a6c69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page