Distibuted dbt runs on Apache Airflow
Project description
dbt-af: distributed run of dbt models using Airflow
Overview
dbt-af is a tool that allows you to run dbt models in a distributed manner using Airflow. It acts as a wrapper around the Airflow DAG, allowing you to run the models independently while preserving their dependencies.
Why?
- dbt-af is domain-driven. It is designed to separate models from different domains into different DAGs. This allows you to run models from different domains in parallel.
- dbt-af brings scheduling to dbt. You can schedule your dbt models to run at a specific time.
- dbt-af is an ETL-driven tool. You can separate your models into tiers or ETL stages and build graphs showing the dependencies between models within each tier or stage.
- dbt-af brings additional features to use different dbt targets simultaneously, different tests scenarios, and maintenance tasks.
Installation
To install dbt-af
run pip install dbt-af
.
To contribute we recommend to use poetry
to install package dependencies. Run poetry install --with=dev
to install
all dependencies.
dbt-af by Example
All tutorials and examples are located in the examples folder.
To get basic Airflow DAGs for your dbt project, you need to put the following code into your dags
folder:
# LABELS: dag, airflow (it's required for airflow dag-processor)
from dbt_af.dags import compile_dbt_af_dags
from dbt_af.conf import Config, DbtDefaultTargetsConfig, DbtProjectConfig
# specify here all settings for your dbt project
config = Config(
dbt_project=DbtProjectConfig(
dbt_project_name='my_dbt_project',
dbt_project_path='/path/to/my_dbt_project',
dbt_models_path='/path/to/my_dbt_project/models',
dbt_profiles_path='/path/to/my_dbt_project',
dbt_target_path='/path/to/my_dbt_project/target',
dbt_log_path='/path/to/my_dbt_project/logs',
dbt_schema='my_dbt_schema',
),
dbt_default_targets=DbtDefaultTargetsConfig(default_target='dev'),
is_dev=False, # set to True if you want to turn on dry-run mode
)
dags = compile_dbt_af_dags(manifest_path='/path/to/my_dbt_project/target/manifest.json', config=config)
for dag_name, dag in dags.items():
globals()[dag_name] = dag
In dbt_project.yml you need to set up default targets for all nodes in your project (see example):
sql_cluster: "dev"
daily_sql_cluster: "dev"
py_cluster: "dev"
bf_cluster: "dev"
This will create Airflow DAGs for your dbt project.
Project Information
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dbt_af-0.4.1.tar.gz
.
File metadata
- Download URL: dbt_af-0.4.1.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d90f5a2e99d3560f9889566f73c01bd87212b8fdf2a0d952624dc8d78d5dde64 |
|
MD5 | f176b374c3fa229d2f1e85578eb29f95 |
|
BLAKE2b-256 | 55ed1c538c5a7198f74a7a1d8b036e08b73ac43b2fdd67c21b0a535327e875a6 |
Provenance
File details
Details for the file dbt_af-0.4.1-py3-none-any.whl
.
File metadata
- Download URL: dbt_af-0.4.1-py3-none-any.whl
- Upload date:
- Size: 49.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb5cb08e958488e22a2b16fc76dab903699fe0f17504dfbb18a95516d7ff1a62 |
|
MD5 | 9e129d8fa819750fdb474abeb5f230f6 |
|
BLAKE2b-256 | f988b1317d4049129a60a1070d7317aac708cdcb97fd71ab94b4e9bf5038b955 |