Skip to main content

CLI tool for dbt users adopting analytics engineering best practices.

Project description

dbt-coves

Maintenance PyPI version fury.io Code Style Imports: isort Imports: python Build

Maintainability Downloads

What is dbt-coves?

dbt-coves is a CLI tool that automates certain tasks for dbt making life simpler for the dbt user.

dbt-coves generates dbt soruces and staging models and property(yml) files by analyzing information from the data warehouse and creating the necessary files (sql and yml).

Finally, dbt-coves includes functionality to bootstrap a dbt project and to extract and load configurations from Airbyte.

Supported dbt versions

Version Status
< 1.0 ❌ Not supported
>= 1.0 ✅ Tested

Supported adapters

Feature Snowflake Redshift
dbt project setup ✅ Tested 🕥 In progress
source model (sql) generation ✅ Tested 🕥 In progress
model properties (yml) generation ✅ Tested 🕥 In progress

NOTE: Other database adapters may work, we have just not tested them. Feed free to try them and let us know if you test them we can update the table above.

Here's the tool in action

image

Installation

pip install dbt-coves

We recommend using python virtualenvs and create one separate environment per project.

Command Reference

For a complete list of options, please run:

dbt-coves -h
dbt-coves <command> -h

Environment setup

Setting up your environment can be done in two different ways:

Runs a set of scripts in your local environment to configure your project components: ssh keys, git and dbt

dbt-coves setup all

You can configure individual components:

Set up git repository of dbt-coves project

dbt-coves setup git

Setup dbt within the project (delegates to dbt init)

dbt-coves setup dbt

Set up SSH Keys for dbt project. Supports the argument --open_ssl_public_key which generates an extra Public Key in Open SSL format, useful for configuring certain providers (i.e. Snowflake authentication)

dbt-coves setup ssh

Models generation

dbt-coves generate <resource>

Where <resource> could be sources or properties.

dbt-coves generate sources

This command will generate the dbt source configuration as well as the initial dbt staging model(s). It will look in the database defined in your profiles.yml file or you can pass the --database argument or set up default configuration options (see below)

dbt-coves generate sources --database raw

Supports Jinja templates to adjust how the resources are generated. See below for examples.

Source Generation Arguments

dbt-coves can be used to create the initial staging models. It will do the following:

  1. Create / Update the source yml file
  2. Create the initial staging model(sql) file and offer to flatten VARIANT(JSON) fields
  3. Create the staging model's property(yml) file.

dbt-coves generate sources supports the following args:

See full list in help

dbt-coves generate sources -h
--database
# Database to inspect
--schema
# Schema to inspect
--sources-destination
# Where sources yml files will be generated, default: 'models/staging/{{schema}}/sources.yml'
--sources-destination
# Where sources yml files will be generated, default: 'models/staging/{{schema}}/{{schema}}.yml'
--models-destination
# Where models sql files will be generated, default: 'models/staging/{{schema}}/{{relation}}.sql'
--model-props-destination
# Where models yml files will be generated, default: 'models/staging/{{schema}}/{{relation}}.yml'
--update-strategy
# Action to perform when a property file already exists: 'update', 'recreate', 'fail', 'ask' (per file)

Properties Generation Arguments

You can use dbt-coves to generate and update the properties(yml) file for a given dbt model(sql) file.

dbt-coves generate properties supports the following args:

--destination
# Where models yml files will be generated, default: '{{model_folder_path}}/{{model_file_name}}.yml'
--update-strategy
# Action to perform when a property file already exists: 'update', 'recreate', 'fail', 'ask' (per file)
--model
# Model(s) path where 'dbt ls' will look for models for generation, i.e: 'models/staging' or 'models/staging/my_model.sql'

Source Metadata

dbt-coves supports the argument --metadata which allows users to specify a csv file containing field types and descriptions to be used when creating the staging models and property files.

dbt-coves generate sources --metadata metadata.csv

Metadata format: You can download a sample csv file as reference

database schema relation column key type description
raw master person name (empty) varchar The full name
raw master person name groupName varchar The group name

Extract configuration from Airbyte

dbt-coves extract airbyte

Extracts the configuration from your Airbyte sources, connections and destinations (excluding credentials) and stores it in the specified folder. The main goal of this feature is to keep track of the configuration changes in your git repo, and rollback to a specific version when needed.

Full usage example:

dbt-coves extract airbyte --host http://airbyte-server --port 8001 --path /config/workspace/load

Load configuration to Airbyte

dbt-coves load airbyte

Loads the Airbyte configuration generated with dbt-coves extract airbyte on an Airbyte server. Secrets folder needs to be specified separately. You can use git-secret to encrypt secrets and make them part of your git repo.

Loading secrets

Secret credentials can be approached in two different ways: locally or remotely (through a provider/manager).

In order to load encrypted fields locally:

dbt-coves load airbyte --secrets-path /path/to/secret/directory

# This directory must have 'sources', 'destinations' and 'connections' folders nested inside, and inside them the respective JSON files with unencrypted fields.
# Naming convention: JSON unencrypted secret files must be named exactly as the extracted ones.

To load encrypted fields through a manager (in this case we are connecting to Datacoves' Service Credentials):

--secrets-manager datacoves
--secrets-url https://api.datacoves.localhost/service-credentials/airbyte
--secrets-token <secret token>

Full usage example:

dbt-coves load airbyte --host http://airbyte-server --port 8001 --path /config/workspace/load --secrets-path /config/workspace/secrets

dbt-coves Settings

dbt-coves could optionally read settings from .dbt_coves.yml or .dbt_coves/config.yml. A standard settings files could look like this:

generate:
  sources:
    database: RAW # Database where to look for source tables
    schemas: # List of schema names where to look for source tables
      - RAW
    sources_destination: "models/staging/{{schema}}/{{schema}}.yml" # Where sources yml files will be generated
    models_destination: "models/staging/{{schema}}/{{relation}}.sql" # Where models sql files will be generated
    model_props_destination: "models/staging/{{schema}}/{{relation}}.yml" # Where models yml files will be generated
    update_strategy: ask # Action to perform when a property file already exists. Options: update, recreate, fail, ask (per file)
    templates_folder: ".dbt_coves/templates" # Folder where source generation jinja templates are located. Override default templates creating  source_props.yml, source_model_props.yml, and source_model.sql under this folder

  properties:
    destination: "{{model_folder_path}}/{{model_file_name}}.yml" # Where models yml files will be generated
    # You can specify a different path by declaring it explicitly, i.e.: "models/staging/{{model_file_name}}.yml"
    update-strategy: ask # Action to perform when a property file already exists. Options: update, recreate, fail, ask (per file)
    models: "models/" # Model(s) path where 'generate properties' will look for models for generation

extract:
  airbyte:
    path: /config/workspace/load # Where json files will be generated
    host: http://airbyte-server # Airbyte's API hostname
    port: 8001 # Airbyte's API port
    dbt_list_args: --exclude source:dbt_artifacts # Extra dbt arguments: selectors, modifiers, etc

load:
  airbyte:
    path: /config/workspace/load
    host: http://airbyte-server
    port: 8001
    dbt_list_args: --exclude source:dbt_artifacts
    secrets_path: /config/workspace/secrets # Secret files location for Airbyte configuration
    secrets_manager: datacoves # Secret credentials provider (secrets_path OR secrets_manager should be used, can't load secrets locally and remotely at the same time)
    secrets_url: https://api.datacoves.localhost/service-credentials/airbyte # Secret credentials provider url
    secrets_token: AbCdEf123456 # Secret credentials provider token

Override source generation templates

Customizing generated models and model properties requires placing template files under the .dbt-coves/templates folder as follows:

source_props.yml

This file is used to create the sources yml file

version: 2

sources:
  - name: {{ relation.schema.lower() }}
{%- if source_database %}
    database: {{ source_database }}
{%- endif %}
    tables:
      - name: {{ relation.name.lower() }}

source_model.sql

This file is used to create the staging model(sql) file

with raw_source as (

    select
        parse_json(replace(_airbyte_data::string,'"NaN"', 'null')) as airbyte_data_clean,
        *
    from {% raw %}{{{% endraw %} source('{{ relation.schema.lower() }}', '{{ relation.name.lower() }}') {% raw %}}}{% endraw %}

),

final as (

    select
{%- if adapter_name == 'SnowflakeAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        {{ key if key != '_airbyte_data' else 'airbyte_data_clean' }}:{{ '"' + col + '"' }}::varchar as {{ col.lower().replace(" ","_").replace(":","_").replace("(","_").replace(")","_").replace("-","_").replace("/","_") }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- elif adapter_name == 'BigQueryAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        cast({{ key }}.{{ col }} as string) as {{ col.lower().replace(" ","_").replace(":","_").replace("(","_").replace(")","_") }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- elif adapter_name == 'RedshiftAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        {{ key }}.{{ col }}::varchar as {{ col.lower().replace(" ","_").replace(":","_").replace("(","_").replace(")","_") }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- endif %}
{%- for col in columns %}
        {{ '"' + col.name + '"' }} as {{ col.name.lower() }}{% if not loop.last %},{% endif %}
{%- endfor %}

    from raw_source

)

select * from final

source_model_props.yml

This file is used to create the staging properties(yml) file

version: 2

models:
  - name: {{ model.lower() }}
    columns:
{%- for cols in nested.values() %}
  {%- for col in cols %}
      - name: {{ col.lower().replace(" ","_").replace(":","_").replace("(","_").replace(")","_").replace("-","_").replace("/","_") }}
  {%- endfor %}
{%- endfor %}
{%- for col in columns %}
      - name: {{ col.name.lower() }}
{%- endfor %}

model_props.yml

This file is used to create the properties(yml) files for non-staging models

version: 2

models:
  - name: {{ model.lower() }}
    columns:
{%- for col in columns %}
      - name: {{ col['id'] }}
      {%- if col['description'] %}
        description: "{{ col['description'] }}"
      {%- endif %}
{%- endfor %}

Thanks

The project main structure was inspired by dbt-sugar. Special thanks to Bastien Boutonnet for the great work done.

Authors

About

Learn more about Datacoves.

⚠️ dbt-coves is still in development, make sure to test it for your dbt project version and DW before using in production and please submit any issues you find. We also welcome any contributions from the community

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_coves-1.1.1a19.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

dbt_coves-1.1.1a19-py3-none-any.whl (52.5 kB view details)

Uploaded Python 3

File details

Details for the file dbt_coves-1.1.1a19.tar.gz.

File metadata

  • Download URL: dbt_coves-1.1.1a19.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.14 Linux/5.15.0-1022-azure

File hashes

Hashes for dbt_coves-1.1.1a19.tar.gz
Algorithm Hash digest
SHA256 d6666fbc4f1e861fab751a93a4073955367c3b85b20c52a8af41c9048829d072
MD5 7560fff7b7665639d6bf40271ac06a4a
BLAKE2b-256 0e7e4815736a317c7a57b7d06c11a24f415bf12ec398d5365c964850e5048408

See more details on using hashes here.

File details

Details for the file dbt_coves-1.1.1a19-py3-none-any.whl.

File metadata

  • Download URL: dbt_coves-1.1.1a19-py3-none-any.whl
  • Upload date:
  • Size: 52.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.14 Linux/5.15.0-1022-azure

File hashes

Hashes for dbt_coves-1.1.1a19-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa13338d1631a78b499e6679fe982867185b6def4f0f4e0a12516dbb532520b
MD5 a3d3e1954a2103b6c4e19398228d148a
BLAKE2b-256 3bf1959ee4123eed0705c68edc069ea2f88651974fbe34e3bfe951cc2044a395

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page