Skip to main content

The Yellowbrick adapter plugin for dbt (data build tool)

Project description

dbt-yellowbrick

The dbt-yellowbrick adapter allows dbt to work with Yellowbrick Data Warehouse and leverage the powerful capabilities of both platforms to build data analysis workflows. This adapter is based on the postgres-adapter with extensions to support Yellowbrick specific features.

The dbt-yellowbrick adapter has been developed for projects that implement dbt-core through the command-line interface (CLI) which is available as an open source project.

Installation

This project is hosted on PyPI, so you should be able to install dbt-yellowbrick and the necessary dependencies via:

pip install dbt-yellowbrick

The latest supported version targets dbt-core 1.7.x .

dbt Profile Configuration

Here is a basic example of a profile configuration (profiles.yml) to use with dbt-yellowbrick.

yb_test_models:
  target: dev
  outputs:
    dev:
      type: yellowbrick
      host: <host>
      user: <user_name>
      password: <password>
      port: 5432
      dbname: <database_name>
      schema: <schema_name>
      threads: 1
      connect_timeout: 30 # seconds  
  
    prod:
      type: yellowbrick
      host: <host>
      user: <user_name>
      password: <password>
      port: 5432
      dbname: <database_name>
      schema: <schema_name>
      threads: 1
      connect_timeout: 30 # seconds  

Features

dbt-yellowbrick supports the following Yellowbrick specific features:

  • distribution
  • clustering
  • sort
  • materialisations based on cross-database queries
  • incremental strategies "append" and "delete+insert"

Refer to the official Yellowbrick documentation for detailed explanation of all of these.

Some example model configurations

  • DISTRIBUTE REPLICATE with a SORT column...
{{
  config(
    materialized = "table",
    dist = "replicate",
    sort_col = "stadium_capacity"
  )
}}

select
    hash(stg.name) as team_key
    , stg.name as team_name
    , stg.nickname as team_nickname
    , stg.city as home_city
    , stg.stadium as stadium_name
    , stg.capacity as stadium_capacity
    , stg.avg_att as average_game_attendance
    , current_timestamp as md_create_timestamp
from
    {{ source('premdb_public','team') }} stg
where
    stg.name is not null

gives the following model output:

create table if not exists marts.dim_team as (
select
    hash(stg.name) as team_key
    , stg.name as team_name
    , stg.nickname as team_nickname
    , stg.city as home_city
    , stg.stadium as stadium_name
    , stg.capacity as stadium_capacity
    , stg.avg_att as average_game_attendance
    , current_timestamp as md_create_timestamp
from
    premdb.public.team stg
where
    stg.name is not null
)
distribute REPLICATE
sort on (stadium_capacity);

  • DISTRIBUTE on a single column and define up to four CLUSTER columns...
{{
  config(
    materialized = 'table',
    dist = 'match_key',
    cluster_cols = ['season_key', 'match_date_key', 'home_team_key', 'away_team_key']
  )
}}

select
	hash(concat_ws('||',
	    lower(trim(s.season_name)),
		translate(left(m.match_ts,10), '-', ''),
	    lower(trim(h."name")),
		lower(trim(a."name")))) as match_key
	, hash(lower(trim(s.season_name))) as season_key
	, cast(translate(left(m.match_ts,10), '-', '') as integer) as match_date_key
	, hash(lower(trim(h."name"))) as home_team_key
	, hash(lower(trim(a."name"))) as away_team_key
	, m.htscore
	, split_part(m.htscore, '-', 1)  as home_team_goals_half_time
	, split_part(m.htscore , '-', 2)  as away_team_goals_half_time
	, m.ftscore
	, split_part(m.ftscore, '-', 1)  as home_team_goals_full_time
	, split_part(m.ftscore, '-', 2)  as away_team_goals_full_time
from
	{{ source('premdb_public','match') }} m
		inner join {{ source('premdb_public','team') }} h on (m.htid = h.htid)
		inner join {{ source('premdb_public','team') }} a on (m.atid = a.atid)
		inner join {{ source('premdb_public','season') }} s on (m.seasonid = s.seasonid)

gives the following model output:

create  table if not exists marts.fact_match as (
select
    hash(concat_ws('||',
        lower(trim(s.season_name)),
        translate(left(m.match_ts,10), '-', ''),
        lower(trim(h."name")),
        lower(trim(a."name")))) as match_key
    , hash(lower(trim(s.season_name))) as season_key
    , cast(translate(left(m.match_ts,10), '-', '') as integer) as match_date_key
    , hash(lower(trim(h."name"))) as home_team_key
    , hash(lower(trim(a."name"))) as away_team_key
    , m.htscore
    , split_part(m.htscore, '-', 1)  as home_team_goals_half_time
    , split_part(m.htscore , '-', 2)  as away_team_goals_half_time
    , m.ftscore
    , split_part(m.ftscore, '-', 1)  as home_team_goals_full_time
    , split_part(m.ftscore, '-', 2)  as away_team_goals_full_time
from
    premdb.public.match m
        inner join premdb.public.team h on (m.htid = h.htid)
        inner join premdb.public.team a on (m.atid = a.atid)
        inner join premdb.public.season s on (m.seasonid = s.seasonid)
)
distribute on (match_key)
cluster on (season_key, match_date_key, home_team_key, away_team_key);

Limitations

This is an initial implementation of the dbt adapter for Yellowbrick Data Warehouse and may not support some use cases. We strongly advise validating all records or transformations resulting from the adapter output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_yellowbrick-1.7.1.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

dbt_yellowbrick-1.7.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_yellowbrick-1.7.1.tar.gz.

File metadata

  • Download URL: dbt_yellowbrick-1.7.1.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for dbt_yellowbrick-1.7.1.tar.gz
Algorithm Hash digest
SHA256 d1d2e1927f892d1c91ceee6299e971ba73c6838e6e31c53f937a9a69d1ca591e
MD5 b05fe8e1275e4897b4d256f1ceb2564d
BLAKE2b-256 48d3d1f2e0cdfb4f35696aa14cca5399baf3d55c67a81df527ac0aef909dc5dd

See more details on using hashes here.

File details

Details for the file dbt_yellowbrick-1.7.1-py3-none-any.whl.

File metadata

File hashes

Hashes for dbt_yellowbrick-1.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56cf7769076b159fb51923a0a03a0887171083513201950662244972d34a5e18
MD5 91a3ca676d459a58f889a203c3e4c8f8
BLAKE2b-256 219826105dfa06e3b2eac041f3649530ce28d60a8bee941738ca12de72ca3201

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page