Library to convert DBT manifest metadata to Airflow tasks
Project description
DBT Airflow Factory
Library to convert DBT manifest metadata to Airflow tasks using Astronomer Cosmos
What's New in v1.0.0
Version 1.0.0 replaces custom task builders with Astronomer Cosmos integration. This change maintains 100% backward compatibility - no configuration or code changes are required.
Compatibility:
- Apache Airflow 2.5 - 2.11
- dbt-core 1.7 - 1.10 (peer dependency via Cosmos)
- Python 3.9 - 3.11
Key changes:
- Uses Cosmos DbtTaskGroup for model-level task granularity
- Seeds handled automatically from manifest (no config needed)
- Transparent pass-through of all Kubernetes configuration
See MIGRATION.md for upgrade details.
Documentation
Read the full documentation at https://dbt-airflow-factory.readthedocs.io/
Installation
Use the package manager pip to install the library:
pip install dbt-airflow-factory
Usage
The library is expected to be used inside an Airflow environment with a Kubernetes image referencing dbt.
dbt-airflow-factory's main task is to parse manifest.json and create Airflow DAG out of it. It also reads config
files from config directory and therefore is highly customizable (e.g., user can set path to manifest.json).
To start, create a directory with a following structure, where manifest.json is a file generated by dbt:
.
├── config
│ ├── base
│ │ ├── airflow.yml
│ │ ├── dbt.yml
│ │ └── k8s.yml
│ └── dev
│ └── dbt.yml
├── dag.py
└── manifest.json
Then, put the following code into dag.py:
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
from os import path
dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), "dev").create()
When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse manifest.json and prepare a DAG to run.
Configuration files
It is best to look up the example configuration files in tests directory to get a glimpse of correct configs.
You can use Airflow template variables in your dbt.yml and k8s.yml files, as long as they are inside
quotation marks:
target: "{{ var.value.env }}"
some_other_field: "{{ ds_nodash }}"
Analogously, you can use "{{ var.value.VARIABLE_NAME }}" in airflow.yml, but only the Airflow variable getter.
Any other Airflow template variables will not work in airflow.yml.
Creation of the directory with data-pipelines-cli
DBT Airflow Factory works best in tandem with data-pipelines-cli tool. dp not only prepares directory for the library to digest, but also automates Docker image building and pushes generated directory to the cloud storage of your choice.
Development
Running Tests
# Install with test dependencies
pip install -e ".[tests]"
# Run tests
pytest tests/
# Run with coverage
pytest tests/ --cov=dbt_airflow_factory --cov-report=term-missing
Known Issue: Installing on Some Systems
If you encounter compilation errors related to google-re2 (an Airflow dependency), use one of these solutions:
Option 1 (Recommended) - Use pre-compiled binaries:
pip install dbt-airflow-factory --only-binary=google-re2
This tells pip to use pre-compiled binary wheels instead of compiling from source.
Option 2 - Install system dependencies:
If binary wheels aren't available for your platform, install the system-level RE2 library:
# Ubuntu/Debian
sudo apt-get install -y libre2-dev
# macOS
brew install re2
# Alpine Linux
apk add --no-cache re2-dev
Then retry: pip install dbt-airflow-factory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dbt_airflow_factory-1.0.0.tar.gz.
File metadata
- Download URL: dbt_airflow_factory-1.0.0.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c39808ac4af341755bb175d8fcc32e91587d775e00e617bdb512a543e7a1e8b
|
|
| MD5 |
c715d1067ea5b56e56488604823b555c
|
|
| BLAKE2b-256 |
b8402747d96379b1492d749df8d09b24399490ac44fe9d67dabd28f8983c9705
|