Skip to main content

CLI tool for dbt users adopting analytics engineering best practices.

Project description

dbt-coves

Maintenance PyPI version fury.io Code Style Checked with mypy Imports: isort Imports: python Build pre-commit.ci status codecov Maintainability Downloads

What is dbt-coves?

dbt-coves is a complimentary CLI tool for dbt that allows users to quickly apply Analytics Engineering best practices.

dbt-coves helps with the generation of scaffold for dbt by analyzing your data warehouse schema in Redshift, Snowflake, or Big Query and creating the necessary configuration files (sql and yml).

⚠️ dbt-coves is in alpha, make sure to test it for your dbt project version and DW before using in production

Here's the tool in action

image

Supported dbt versions

Version Status
< 1.0 ❌ Not supported
>= 1.0 ✅ Tested

Supported adapters

Feature Snowflake Redshift BigQuery
dbt project setup ✅ Tested 🕥 In progress ❌ Not tested
source model (sql) generation ✅ Tested 🕥 In progress ❌ Not tested
model properties (yml) generation ✅ Tested 🕥 In progress ❌ Not tested

Installation

pip install dbt-coves

We recommend using python virtualenvs and create one separate environment per project.

Main Features

For a complete detail of usage, please run:

dbt-coves -h
dbt-coves <command> -h

Environment setup

Setting up your environment can be done in two different ways:

dbt-coves setup all

Runs a set of checks in your local environment and helps you configure every project component properly: ssh keys, git and dbt

You can also configure individual components:

dbt-coves setup git

Set up git repository of dbt-coves project

dbt-coves setup dbt

Setup dbt within the project (delegates to dbt init)

dbt-coves setup ssh

Set up SSH Keys for dbt-coves project. Supports the argument --open_ssl_public_key which generates an extra Public Key in Open SSL format, useful for configuring certain providers (i.e. Snowflake authentication)

Models generation

dbt-coves generate <resource>

Where <resource> could be sources or properties.

Code generation tool to easily generate models and model properties based on configuration and existing data.

Supports Jinja templates to adjust how the resources are generated.

Arguments

dbt-coves generate sources supports the following args:

--sources-destination
# Where sources yml files will be generated, default: 'models/staging/{{schema}}/sources.yml'
--models-destination
# Where models sql files will be generated, default: 'models/staging/{{schema}}/{{relation}}.sql'
--model-props-destination
# Where models yml files will be generated, default: 'models/staging/{{schema}}/{{relation}}.yml'
--update-strategy
# Action to perform when a property file already exists: 'update', 'recreate', 'fail', 'ask' (per file)

Metadata

Supports the argument --metadata which allows to specify a csv file containing field types and descriptions to be inserted into the model property files.

dbt-coves generate sources --metadata metadata.csv

Metadata format:

database schema relation column key type description
raw master person name (empty) varchar The full name
raw master person name groupName varchar The group name

Extract configuration from Airbyte

dbt-coves extract airbyte

Extracts the configuration from your Airbyte sources, connections and destinations (excluding credentials) and stores it in the specified folder. The main goal of this feature is to keep track of the configuration changes in your git repo, and rollback to a specific version when needed.

Full usage example:

dbt-coves extract airbyte --host http://airbyte-server --port 8001 --path /config/workspace/load

Load configuration to Airbyte

dbt-coves load airbyte

Loads the Airbyte configuration generated with dbt-coves extract airbyte on an Airbyte server. Secrets folder needs to be specified separatedly. You can use git-secret to encrypt them and make them part of your git repo.

Loading secrets

Secret credentials can be approached in two different ways: locally or remotely (through a provider/manager).

In order to load encrypted fields locally:

dbt-coves load airbyte --secrets-path /path/to/secret/directory

# This directory must have 'sources', 'destinations' and 'connections' folders nested inside, and inside them the respective JSON files with unencrypted fields.
# Naming convention: JSON unencrypted secret files must be named exactly as the extracted ones.

To load encrypted fields through a manager (in this case we are connecting to Datacoves' Service Credentials):

--secrets-manager datacoves
--secrets-url https://api.datacoves.localhost/service-credentials/airbyte
--secrets-token AbCdEf123456

Full usage example:

dbt-coves load airbyte --host http://airbyte-server --port 8001 --path /config/workspace/load --secrets-path /config/workspace/secrets

Settings

Dbt-coves could optionally read settings from .dbt_coves.yml or .dbt_coves/config.yml. A standard settings files could looke like this:

generate:
  sources:
    database: RAW # Database where to look for source tables
    schemas: # List of schema names where to look for source tables
      - RAW
    sources_destination: "models/staging/{{schema}}/sources.yml" # Where sources yml files will be generated
    models_destination: "models/staging/{{schema}}/{{relation}}.sql" # Where models sql files will be generated
    model_props_destination: "models/staging/{{schema}}/{{relation}}.yml" # Where models yml files will be generated
    update_strategy: ask # Action to perform when a property file already exists. Options: update, recreate, fail, ask (per file)
    templates_folder: ".dbt_coves/templates" # Folder where source generation jinja templates are located. Override default templates creating source_model_props.yml, source_props.yml and source_model.sql under this folder

extract:
  airbyte:
    path: /config/workspace/load # Where json files will be generated
    host: http://airbyte-server # Airbyte's API hostname
    port: 8001 # Airbyte's API port
    dbt_list_args: --exclude source:dbt_artifacts # Extra dbt arguments: selectors, modifiers, etc

load:
  airbyte:
    path: /config/workspace/load
    host: http://airbyte-server
    port: 8001
    dbt_list_args: --exclude source:dbt_artifacts
    secrets_path: /config/workspace/secrets # Secret files location for Airbyte configuration
    secrets_manager: datacoves # Secret credentials provider (secrets_path OR secrets_manager should be used, can't load secrets locally and remotely at the same time)
    secrets_url: https://api.datacoves.localhost/service-credentials/airbyte # Secret credentials provider url
    secrets_token: AbCdEf123456 # Secret credentials provider token

Override source generation templates

Customizing generated models and model properties requires placing specific files under the templates_folder folder like these:

source_model.sql

with raw_source as (

    select *
    from {% raw %}{{{% endraw %} source('{{ relation.schema.lower() }}', '{{ relation.name.lower() }}') {% raw %}}}{% endraw %}

),

final as (

    select
{%- if adapter_name == 'SnowflakeAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        {{ key }}:{{ '"' + col + '"' }}::{{ cols[col]["type"] }} as {{ cols[col]["id"] }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- elif adapter_name == 'BigQueryAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        cast({{ key }}.{{ col }} as {{ cols[col]["type"].replace("varchar", "string") }}) as {{ cols[col]["id"] }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- elif adapter_name == 'RedshiftAdapter' %}
{%- for key, cols in nested.items() %}
  {%- for col in cols %}
        {{ key }}.{{ col }}::{{ cols[col]["type"] }} as {{ cols[col]["id"] }}{% if not loop.last or columns %},{% endif %}
  {%- endfor %}
{%- endfor %}
{%- endif %}
{%- for col in columns %}
        {{ '"' + col['name'] + '"' }} as {{ col['id'] }}{% if not loop.last %},{% endif %}
{%- endfor %}

    from raw_source

)

select * from final

source_props.yml

version: 2

sources:
  - name: {{ relation.schema.lower() }}
{%- if source_database %}
    database: {{ source_database }}
{%- endif %}
    tables:
      - name: {{ relation.name.lower() }}

source_model_props.yml

version: 2

models:
  - name: {{ model.lower() }}
    columns:
{%- for cols in nested.values() %}
  {%- for col in cols %}
      - name: {{ cols[col]["id"] }}
      {%- if cols[col]["description"] %}
        description: "{{ cols[col]['description'] }}"
      {%- endif %}
  {%- endfor %}
{%- endfor %}
{%- for col in columns %}
      - name: {{ col['id'] }}
      {%- if col['description'] %}
        description: "{{ col['description'] }}"
      {%- endif %}
{%- endfor %}

model_props.yml

version: 2

models:
  - name: {{ model.lower() }}
    columns:
{%- for col in columns %}
      - name: {{ col['id'] }}
      {%- if col['description'] %}
        description: "{{ col['description'] }}"
      {%- endif %}
{%- endfor %}

Thanks

The project main structure was inspired by dbt-sugar. Special thanks to Bastien Boutonnet for the great work done.

Authors

About

Learn more about Datacoves.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_coves-1.1.1a12.tar.gz (44.4 kB view hashes)

Uploaded source

Built Distribution

dbt_coves-1.1.1a12-py3-none-any.whl (51.4 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page