Skip to main content

Airflow utilities for configuration of many DAGs and DAG environments

Project description

airflow-config

Apache Airflow utilities for for configuration of many DAGs and DAG environments

Build Status codecov License PyPI

Overview

This library allows for YAML-driven configuration of Airflow, including DAGs, Operators, and declaratively defined DAGs (à la dag-factory). It is built with Pydantic, Hydra, and OmegaConf.

Consider the following basic DAG:

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

with DAG(
    dag_id="test-dag",
    default_args={
        "depends_on_past": False,
        "email": ["my.email@myemail.com"],
        "email_on_failure": False,
        "email_on_retry": False,
        "retries": 0,
    },
    description="test that dag is working properly",
    schedule=timedelta(minutes=1),
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["utility", "test"],
):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

We can already see many options that we might want to drive centrally via config, perhaps based on some notion of environment (e.g. dev, prod, etc).

  • "email": ["my.email@myemail.com"]
  • "email_on_failure": False
  • "email_on_retry": False
  • "retries": 0
  • schedule=timedelta(minutes=1)
  • tags=["utility", "test"]

If we want to change these in our DAG, we need to modify code. Now imagine we have hundreds of DAGs, this can quickly get out of hand, especially since Airflow DAGs are Python code, and we might easily inject a syntax error or a trailing comma or other common problem.

Now consider the alternative, config-driven approach:

config/dev.yaml

# @package _global_
_target_: airflow_config.Configuration
default_args:
  _target_: airflow_config.DefaultArgs
  owner: test
  email: [myemail@myemail.com]
  email_on_failure: false
  email_on_retry: false
  retries: 0
  depends_on_past: false
all_dags:
  _target_: airflow_config.DagArgs
  schedule: "01:00"
  start_date: "2024-01-01"
  catchup: false
  tags: ["utility", "test"]
from airflow.operators.bash import BashOperator
from airflow_config import DAG, load_config

config = load_config(config_name="dev")

with DAG(
    dag_id="test-dag",
    description="test that dag is working properly",
    schedule=timedelta(minutes=1),
    config=config
):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

This has a number of benefits:

  • Make changes without code changes, with static type validation
  • Make changes across any number of DAGs without having to copy-paste
  • Organize collections of DAGs into groups, e.g. via enviroment like dev, prod, etc

Configuration

More documentation coming soon!

Integrations

Configuration can be arbitrarily extended by the key extensions. Support is built in for airflow-priority, but can be extended to any aribitrary pydantic model as seen in the README of airflow-supervisor.

License

This software is licensed under the Apache 2.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_config-0.2.0.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

airflow_config-0.2.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file airflow_config-0.2.0.tar.gz.

File metadata

  • Download URL: airflow_config-0.2.0.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for airflow_config-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6233da3b3a863b0a93f8a3252a0068f257742ad5ae0c402deba232e27164301a
MD5 22e8ec5619930820cee93553e3c3e571
BLAKE2b-256 e4738a0fffa1e0d841987e1479d31270e051e32f47ffbc0625b044748e8a3e79

See more details on using hashes here.

File details

Details for the file airflow_config-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_config-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41549573f8672edc3e09b342ede875e256d0a120e984014fa63114d2c056c315
MD5 c4c79ca2708fb9e982bb54442db66f3a
BLAKE2b-256 6f8910629f45d97bcfd2fb33860d2c4b5b7caca3352e4807880650055db8c6ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page