Skip to main content

Airflow utilities for configuration of many DAGs and DAG environments

Project description

airflow-config

Apache Airflow utilities for for configuration of many DAGs and DAG environments

Build Status codecov License PyPI

Overview

This library allows for YAML-driven configuration of Airflow, including DAGs, Operators, and declaratively defined DAGs (à la dag-factory). It is built with Pydantic, Hydra, and OmegaConf.

Consider the following basic DAG:

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

with DAG(
    dag_id="test-dag",
    default_args={
        "depends_on_past": False,
        "email": ["my.email@myemail.com"],
        "email_on_failure": False,
        "email_on_retry": False,
        "retries": 0,
    },
    description="test that dag is working properly",
    schedule=timedelta(minutes=1),
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["utility", "test"],
):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

We can already see many options that we might want to drive centrally via config, perhaps based on some notion of environment (e.g. dev, prod, etc).

  • "email": ["my.email@myemail.com"]
  • "email_on_failure": False
  • "email_on_retry": False
  • "retries": 0
  • schedule=timedelta(minutes=1)
  • tags=["utility", "test"]

If we want to change these in our DAG, we need to modify code. Now imagine we have hundreds of DAGs, this can quickly get out of hand, especially since Airflow DAGs are Python code, and we might easily inject a syntax error or a trailing comma or other common problem.

Now consider the alternative, config-driven approach:

config/config.yaml

# @package _global_
_target_: airflow_config.Configuration
default_args:
  _target_: airflow_config.DefaultArgs
  owner: test
  email: [myemail@myemail.com]
  email_on_failure: false
  email_on_retry: false
  retries: 0
  depends_on_past: false
all_dags:
  _target_: airflow_config.DagArgs
  schedule: "01:00"
  start_date: "2024-01-01"
  catchup: false
  tags: ["utility", "test"]
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow_config import DAG, load_config

config = load_config()

with DAG(dag_id="test-dag", description="test that dag is working properly", config=config):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

This has a number of benefits:

  • Make changes without code changes, with static type validation
  • Make changes across any number of DAGs without having to copy-paste
  • Organize collections of DAGs into groups, e.g. via enviroment like dev, prod, etc

Configuration

More documentation coming soon!

License

This software is licensed under the Apache 2.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_config-0.1.3.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

airflow_config-0.1.3-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file airflow_config-0.1.3.tar.gz.

File metadata

  • Download URL: airflow_config-0.1.3.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for airflow_config-0.1.3.tar.gz
Algorithm Hash digest
SHA256 54f498daf486c757dbbbee9a0778eebb45e6d61c8bd3108677d4be2f4d4f001f
MD5 3b1f505f0b3eab8adf9c8e3f433f7859
BLAKE2b-256 e07275dbb1cc2b9cbba37d47ffa710e35bd2bf37a0fcff7429eabc8c60b11782

See more details on using hashes here.

File details

Details for the file airflow_config-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_config-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0d64e759eecdaf62e11360289559baa880b07ef435e04c625500c8c9e16cb05d
MD5 e116a423672f24cb10021037cc0d0e76
BLAKE2b-256 cd024405e4e2766731c456bee031ad4f642d99e91e40d94442914f2dc57ade0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page