Airflow utilities for configuration of many DAGs and DAG environments
Project description
airflow-config
Apache Airflow utilities for for configuration of many DAGs and DAG environments
Overview
This library allows for YAML
-driven configuration of Airflow, including DAGs, Operators, and declaratively defined DAGs (à la dag-factory). It is built with Pydantic, Hydra, and OmegaConf.
Consider the following basic DAG:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta
with DAG(
dag_id="test-dag",
default_args={
"depends_on_past": False,
"email": ["my.email@myemail.com"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 0,
},
description="test that dag is working properly",
schedule=timedelta(minutes=1),
start_date=datetime(2024, 1, 1),
catchup=False,
tags=["utility", "test"],
):
BashOperator(
task_id="test-task",
bash_command="echo 'test'",
)
We can already see many options that we might want to drive centrally via config, perhaps based on some notion of environment (e.g. dev
, prod
, etc).
"email": ["my.email@myemail.com"]
"email_on_failure": False
"email_on_retry": False
"retries": 0
schedule=timedelta(minutes=1)
tags=["utility", "test"]
If we want to change these in our DAG, we need to modify code. Now imagine we have hundreds of DAGs, this can quickly get out of hand, especially since Airflow DAGs are Python code, and we might easily inject a syntax error or a trailing comma or other common problem.
Now consider the alternative, config-driven approach:
config/config.yaml
# @package _global_
_target_: airflow_config.Configuration
default_args:
_target_: airflow_config.DefaultArgs
owner: test
email: [myemail@myemail.com]
email_on_failure: false
email_on_retry: false
retries: 0
depends_on_past: false
all_dags:
_target_: airflow_config.DagArgs
schedule: "01:00"
start_date: "2024-01-01"
catchup: false
tags: ["utility", "test"]
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow_config import DAG, load_config
config = load_config()
with DAG(dag_id="test-dag", description="test that dag is working properly", config=config):
BashOperator(
task_id="test-task",
bash_command="echo 'test'",
)
This has a number of benefits:
- Make changes without code changes, with static type validation
- Make changes across any number of DAGs without having to copy-paste
- Organize collections of DAGs into groups, e.g. via enviroment like
dev
,prod
, etc
Configuration
More documentation coming soon!
License
This software is licensed under the Apache 2.0 license. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airflow_config-0.1.3.tar.gz
.
File metadata
- Download URL: airflow_config-0.1.3.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54f498daf486c757dbbbee9a0778eebb45e6d61c8bd3108677d4be2f4d4f001f |
|
MD5 | 3b1f505f0b3eab8adf9c8e3f433f7859 |
|
BLAKE2b-256 | e07275dbb1cc2b9cbba37d47ffa710e35bd2bf37a0fcff7429eabc8c60b11782 |
File details
Details for the file airflow_config-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: airflow_config-0.1.3-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d64e759eecdaf62e11360289559baa880b07ef435e04c625500c8c9e16cb05d |
|
MD5 | e116a423672f24cb10021037cc0d0e76 |
|
BLAKE2b-256 | cd024405e4e2766731c456bee031ad4f642d99e91e40d94442914f2dc57ade0e |