Skip to main content

Airflow utilities for configuration of many DAGs and DAG environments

Project description

airflow-config

Apache Airflow utilities for for configuration of many DAGs and DAG environments

Build Status codecov License PyPI

Overview

This library allows for YAML-driven configuration of Airflow, including DAGs, Operators, and declaratively defined DAGs (à la dag-factory). It is built with Pydantic, Hydra, and OmegaConf.

Consider the following basic DAG:

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

with DAG(
    dag_id="test-dag",
    default_args={
        "depends_on_past": False,
        "email": ["my.email@myemail.com"],
        "email_on_failure": False,
        "email_on_retry": False,
        "retries": 0,
    },
    description="test that dag is working properly",
    schedule=timedelta(minutes=1),
    start_date=datetime(2024, 1, 1),
    catchup=False,
    tags=["utility", "test"],
):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

We can already see many options that we might want to drive centrally via config, perhaps based on some notion of environment (e.g. dev, prod, etc).

  • "email": ["my.email@myemail.com"]
  • "email_on_failure": False
  • "email_on_retry": False
  • "retries": 0
  • schedule=timedelta(minutes=1)
  • tags=["utility", "test"]

If we want to change these in our DAG, we need to modify code. Now imagine we have hundreds of DAGs, this can quickly get out of hand, especially since Airflow DAGs are Python code, and we might easily inject a syntax error or a trailing comma or other common problem.

Now consider the alternative, config-driven approach:

config/config.yaml

# @package _global_
_target_: airflow_config.Configuration
default_args:
  _target_: airflow_config.DefaultArgs
  owner: test
  email: [myemail@myemail.com]
  email_on_failure: false
  email_on_retry: false
  retries: 0
  depends_on_past: false
all_dags:
  _target_: airflow_config.DagArgs
  schedule: "01:00"
  start_date: "2024-01-01"
  catchup: false
  tags: ["utility", "test"]
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow_config import DAG, load_config

config = load_config()

with DAG(dag_id="test-dag", description="test that dag is working properly", config=config):
    BashOperator(
        task_id="test-task",
        bash_command="echo 'test'",
    )

This has a number of benefits:

  • Make changes without code changes, with static type validation
  • Make changes across any number of DAGs without having to copy-paste
  • Organize collections of DAGs into groups, e.g. via enviroment like dev, prod, etc

Configuration

More documentation coming soon!

License

This software is licensed under the Apache 2.0 license. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_config-0.1.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

airflow_config-0.1.1-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_config-0.1.1.tar.gz.

File metadata

  • Download URL: airflow_config-0.1.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.9

File hashes

Hashes for airflow_config-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9960ec4b4dfcb7a5026c635a038309a7c68b709aeaf723939e3282d03d62ead8
MD5 f4cc058544247c71768bc795da9143c0
BLAKE2b-256 3b33a77ec2e60e7f22e889e58cb88806b537f3d9e6985902ea58357e5478676c

See more details on using hashes here.

File details

Details for the file airflow_config-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_config-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc6e1eb18ba3c0ec60f1025f5b485c7c65638e5bf9523e6bf044838ba57d50c
MD5 7b18c729f2bda5570bed77c0a2bfc8cc
BLAKE2b-256 7355deb8485e0c08b4aa775cebc5ee4f07af6208bbc590206564b342851c388c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page