Skip to main content

Distibuted dbt runs on Apache Airflow

Project description

PyPI - Version GitHub Build

License PyPI - Python Version PyPI - Downloads

uv Code style: black

dbt-af: distributed run of dbt models using Airflow

Overview

dbt-af is a tool that allows you to run dbt models in a distributed manner using Airflow. It acts as a wrapper around the Airflow DAG, allowing you to run the models independently while preserving their dependencies.

dbt-af

Why?

  1. dbt-af is domain-driven. It is designed to separate models from different domains into different DAGs. This allows you to run models from different domains in parallel.
  2. dbt-af is dbt-first solution. It is designed to make analytics' life easier. End-users could even not know that Airflow is used to schedule their models. dbt-model's config is an entry point for all your settings and customizations.
  3. dbt-af brings scheduling to dbt. From @monthly to @hourly and even more.
  4. dbt-af is an ETL-driven tool. You can separate your models into tiers or ETL stages and build graphs showing the dependencies between models within each tier or stage.
  5. dbt-af brings additional features to use different dbt targets simultaneously, different tests scenarios, and maintenance tasks.

Installation

To install dbt-af run pip install dbt-af.

To contribute we recommend to use uv to install package dependencies. Run uv sync --all-packages --all-groups --all-extras to install all dependencies.

dbt-af by Example

All tutorials and examples are located in the examples folder.

To get basic Airflow DAGs for your dbt project, you need to put the following code into your dags folder:

# LABELS: dag, airflow (it's required for airflow dag-processor)
from dbt_af.dags import compile_dbt_af_dags
from dbt_af.conf import Config, DbtDefaultTargetsConfig, DbtProjectConfig

# specify here all settings for your dbt project
config = Config(
    dbt_project=DbtProjectConfig(
        dbt_project_name='my_dbt_project',
        dbt_project_path='/path/to/my_dbt_project',
        dbt_models_path='/path/to/my_dbt_project/models',
        dbt_profiles_path='/path/to/my_dbt_project',
        dbt_target_path='/path/to/my_dbt_project/target',
        dbt_log_path='/path/to/my_dbt_project/logs',
        dbt_schema='my_dbt_schema',
    ),
    dbt_default_targets=DbtDefaultTargetsConfig(default_target='dev'),
    dry_run=False,  # set to True if you want to turn on dry-run mode
)

dags = compile_dbt_af_dags(
    manifest_path='/path/to/my_dbt_project/target/manifest.json',
    config=config,
)
for dag_name, dag in dags.items():
    globals()[dag_name] = dag

In dbt_project.yml you need to set up default targets for all nodes in your project (see example):

sql_cluster: "dev"
daily_sql_cluster: "dev"
py_cluster: "dev"
bf_cluster: "dev"

This will create Airflow DAGs for your dbt project.

Check out the documentation for more details here.

Features

  1. dbt-af is essentially designed to work with large projects (1000+ models). When dealing with a significant number of dbt objects across different domains, it becomes crucial to have all DAGs auto-generated. dbt-af takes care of this by generating all the necessary DAGs for your dbt project and structuring them by domains.
  2. Each dbt run is separated into a different Airflow task. All tasks receive a date interval from the Airflow DAG context. By using the passed date interval in your dbt models, you ensure the idempotency of your dbt runs.
  3. dbt-af lowers the entry threshold for non-infrastructure team members. This means that analytics professionals, data scientists, and data engineers can focus on their dbt models and important business logic rather than spending time on Airflow DAGs.

Requirements

dbt-af is tested with:

Airflow version Python versions dbt-core versions
2.6.3 >=3.10,<3.12 >=1.7,<=1.10
2.7.3 >=3.10,<3.12 >=1.7,<=1.10
2.8.4 >=3.10,<3.12 >=1.7,<=1.10
2.9.3 >=3.10,<3.13 >=1.7,<=1.10
2.10.5 >=3.10,<3.13 >=1.7,<=1.10
2.11.0 >=3.10,<3.13 >=1.7,<=1.10

Project Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbt_af-0.14.6.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbt_af-0.14.6-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file dbt_af-0.14.6.tar.gz.

File metadata

  • Download URL: dbt_af-0.14.6.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_af-0.14.6.tar.gz
Algorithm Hash digest
SHA256 ed010d68ead5e9e139b0e9c7936d214768f440f44b41e2e5d69bdb756563fd9e
MD5 4f8b8e190ecfb7a2a100afbe2c7881c7
BLAKE2b-256 8ef10d9daf09386ef3f2357aa53d26d9c17a830171849b66b1d10a61ff274e78

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_af-0.14.6.tar.gz:

Publisher: release.yml on Toloka/dbt-af

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbt_af-0.14.6-py3-none-any.whl.

File metadata

  • Download URL: dbt_af-0.14.6-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dbt_af-0.14.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a406ec2c4af7ce1f55cddb78e08f954209c0365e6181a9d68225d140a1aae68a
MD5 3267a3a137abb35c2178c1d47666ccb9
BLAKE2b-256 38a752994bde0569291111332a26f4511e59342887bc9a8732aad2d91097a011

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbt_af-0.14.6-py3-none-any.whl:

Publisher: release.yml on Toloka/dbt-af

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page