Skip to main content

Distibuted dbt runs on Apache Airflow

Project description

PyPI version GitHub Build

License PyPI - Python Version Code style: black

dbt-af: distributed run of dbt models using Airflow

Overview

dbt-af is a tool that allows you to run dbt models in a distributed manner using Airflow. It acts as a wrapper around the Airflow DAG, allowing you to run the models independently while preserving their dependencies.

dbt-af

Why?

  1. dbt-af is domain-driven. It is designed to separate models from different domains into different DAGs. This allows you to run models from different domains in parallel.
  2. dbt-af brings scheduling to dbt. You can schedule your dbt models to run at a specific time.
  3. dbt-af is an ETL-driven tool. You can separate your models into tiers or ETL stages and build graphs showing the dependencies between models within each tier or stage.
  4. dbt-af brings additional features to use different dbt targets simultaneously, different tests scenarios, and maintenance tasks.

Installation

To install dbt-af run pip install dbt-af.

To contribute we recommend to use poetry to install package dependencies. Run poetry install --with=dev to install all dependencies.

dbt-af by Example

All tutorials and examples are located in the examples folder.

To get basic Airflow DAGs for your dbt project, you need to put the following code into your dags folder:

# LABELS: dag, airflow (it's required for airflow dag-processor)
from dbt_af.dags import compile_dbt_af_dags
from dbt_af.conf import Config, DbtDefaultTargetsConfig, DbtProjectConfig

# specify here all settings for your dbt project
config = Config(
    dbt_project=DbtProjectConfig(
        dbt_project_name='my_dbt_project',
        dbt_project_path='/path/to/my_dbt_project',
        dbt_models_path='/path/to/my_dbt_project/models',
        dbt_profiles_path='/path/to/my_dbt_project',
        dbt_target_path='/path/to/my_dbt_project/target',
        dbt_log_path='/path/to/my_dbt_project/logs',
        dbt_schema='my_dbt_schema',
    ),
    dbt_default_targets=DbtDefaultTargetsConfig(default_target='dev'),
    is_dev=False,  # set to True if you want to turn on dry-run mode
)

dags = compile_dbt_af_dags(manifest_path='/path/to/my_dbt_project/target/manifest.json', config=config)
for dag_name, dag in dags.items():
    globals()[dag_name] = dag

In dbt_project.yml you need to set up default targets for all nodes in your project (see example):

sql_cluster: "dev"
daily_sql_cluster: "dev"
py_cluster: "dev"
bf_cluster: "dev"

This will create Airflow DAGs for your dbt project.

Project Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_af-0.1.0.tar.gz (84.5 kB view details)

Uploaded Source

Built Distribution

dbt_af-0.1.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file dbt_af-0.1.0.tar.gz.

File metadata

  • Download URL: dbt_af-0.1.0.tar.gz
  • Upload date:
  • Size: 84.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for dbt_af-0.1.0.tar.gz
Algorithm Hash digest
SHA256 266505577fb5b8ecb5c089c6e9b82ee04a1bcdc02d1a72c0c3d64694b188626a
MD5 186f723f6cb1797c3de18008c8159cd3
BLAKE2b-256 91ff45edf308f1e48a042d8c5c71e6c21003a8a93220fb42d698d1d11bfbd4d6

See more details on using hashes here.

Provenance

File details

Details for the file dbt_af-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dbt_af-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for dbt_af-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d201b88d085d7cb1921fed65463a960aa30b1d67bfd7093ba8d66be9df0841b
MD5 a5eb017387c3b9112222c4c536506b02
BLAKE2b-256 c5868e68dde2727993072d4d3adbce3633d38e2a9e5ae7ecddaf5f1520f51d4c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page