Skip to main content

Static budget checks and hash stability checks for Apache Airflow DAG files.

Project description

airflow-dag-audit

airflow-dag-audit provides static checks for DAG files and an optional reparse hash check. It is designed for local development, pytest, and command-line use.

The package does not start scheduler, triggerer, webserver, or database services. It focuses on:

  • AST-based counts such as imports, Variable.get(...) calls, SQL-like string literals, and top-level calls
  • a hash stability check that reparses the same DAG file twice
  • a pytest plugin for project defaults and per-test overrides
  • a CLI that can be installed as a tool and executed with uvx

Trademark notice

Apache Airflow, Apache, and related marks belong to The Apache Software Foundation. This project is not affiliated with, endorsed by, or sponsored by The Apache Software Foundation. It is a third-party helper package for DAG repositories.

Installation

Library dependency

uv add airflow-dag-audit

If the environment that runs the hash check does not already have Apache Airflow installed, you can install the optional extra:

uv add 'airflow-dag-audit[airflow]'

CLI tool

After publishing to PyPI, the CLI can be executed without creating a project environment:

uvx airflow-dag-audit --help

What is checked

Static metrics

The AST analysis currently reports:

  • import_count
  • variable_get_count for Variable.get(...)
  • sql_query_count for string literals that look like SQL
  • top_level_call_count
  • detected_dag_decorators

Stable hash

With require_stable_hash=True, the package reparses a DAG file twice and compares canonical serialized payloads.

  • If Apache Airflow is importable, the worker tries to serialize matching DAG objects with SerializedDAG.to_dict(...).
  • Otherwise, it falls back to a generic serializer for DAG-like Python objects.

The check is useful for detecting DAG definitions that mutate during import or serialization.

Python API

Basic assertion

from pathlib import Path

from airflow_dag_audit import DagAuditConfig, assert_dag_budget

config = DagAuditConfig(
    max_imports=20,
    max_variable_gets=2,
    max_sql_queries=3,
    max_top_level_calls=10,
    require_stable_hash=True,
)

assert_dag_budget(Path("dags/example_dag.py"), config=config)

Non-raising inspection

from airflow_dag_audit import DagAuditConfig, audit_dag_file

result = audit_dag_file(
    "dags/example_dag.py",
    config=DagAuditConfig(max_imports=20, require_stable_hash=True),
)

print(result.ok)
print(result.metrics.as_dict())
if result.hash_result:
    print(result.hash_result.first_hashes)

Pytest usage

The package exposes a pytest plugin through the pytest11 entry point.

Project defaults in pyproject.toml

The package supports layered defaults in [tool.airflow-dag-audit]. Use [tool.airflow-dag-audit.budget] for global limits and [[tool.airflow-dag-audit.overrides]] for per-glob or per-file exceptions. Later matching overrides win.

[tool.airflow-dag-audit]
dag_folder = "dags"
include = ["**/*.py"]
exclude = ["**/__pycache__/**", "**/tests/**"]
check_hash_stability = true
hash_parse_repeats = 2

[tool.airflow-dag-audit.budget]
imports = 40
import_froms = 25
variable_get_calls = 10
connection_get_calls = 6
airflow_query_calls = 8
operators = 80
tasks = 150

[[tool.airflow-dag-audit.overrides]]
match = "dags/legacy/*.py"

[tool.airflow-dag-audit.overrides.budget]
imports = 80
variable_get_calls = 25
connection_get_calls = 20
airflow_query_calls = 20
tasks = 300

[[tool.airflow-dag-audit.overrides]]
match = "dags/legacy/specific_bad_but_known.py"
check_hash_stability = false

[tool.airflow-dag-audit.overrides.budget]
imports = 160
import_froms = 90
variable_get_calls = 40
connection_get_calls = 35
airflow_query_calls = 30
operators = 250
tasks = 700

If files is omitted, the package scans dag_folder using include and exclude.

[tool.pytest.ini_options]
airflow_dag_audit_dag_folder = "dags"
airflow_dag_audit_dag_files = [
  "dags/example_good.py",
  "dags/example_unstable.py",
]
airflow_dag_audit_max_imports = "20"
airflow_dag_audit_max_variable_gets = "2"
airflow_dag_audit_max_sql_queries = "3"
airflow_dag_audit_max_top_level_calls = "10"
airflow_dag_audit_require_stable_hash = "true"

Test code

from airflow_dag_audit import assert_dag_budget


def test_dag_budget(dag_file, dag_audit_config) -> None:
    assert_dag_budget(dag_file, config=dag_audit_config)

Per-test overrides

import pytest

from airflow_dag_audit import assert_dag_budget


@pytest.mark.airflow_dag_budget(max_imports=8, require_stable_hash=False)
def test_small_dag(dag_file, dag_audit_config) -> None:
    assert_dag_budget(dag_file, config=dag_audit_config)

Command-line overrides

uv run pytest \
  --airflow-dag-file dags/example_good.py \
  --airflow-dag-folder dags \
  --airflow-dag-max-imports 20 \
  --airflow-dag-max-variable-gets 2 \
  --airflow-dag-max-sql-queries 3 \
  --airflow-dag-max-top-level-calls 10 \
  --airflow-dag-require-stable-hash

Why --airflow-dag-folder exists

When you point to a single DAG file, the package tries to infer a useful DAG folder automatically. That works for common layouts, especially when the file lives under a dags/ directory or inside a Python package.

If the file relies on sibling modules, package-relative imports, or a non-standard repository layout, pass --airflow-dag-folder explicitly. The same applies to DagAuditConfig(dag_folder=...) in Python code.

CLI usage

Scan without failing the process

uvx airflow-dag-audit scan dags \
  --max-imports 20 \
  --max-variable-gets 2 \
  --max-sql-queries 3 \
  --max-top-level-calls 10

Fail on budget violations

uvx airflow-dag-audit assert dags \
  --max-imports 20 \
  --max-variable-gets 2 \
  --max-sql-queries 3 \
  --max-top-level-calls 10 \
  --require-stable-hash

Check only the reparse hash

uvx airflow-dag-audit hash dags/example_unstable.py --dag-folder dags --show-diff

JSON output

uvx airflow-dag-audit scan dags --json

Use only pyproject.toml

uvx airflow-dag-audit assert

Development

Install dependencies

uv sync --group dev

Run tests

uv run pytest

Build distributions

uv run python -m build

Publishing from GitHub tags

The repository includes two workflows:

  • .github/workflows/ci.yml for tests and package build
  • .github/workflows/publish.yml for PyPI publishing on version tags

The publishing workflow is written for PyPI Trusted Publishing. See the section after the ZIP artifact in the chat response for the PyPI and GitHub configuration steps.

Examples

The examples/ directory contains:

  • a stable DAG-like file
  • an unstable DAG-like file that changes hash across reparses
  • a pytest example

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_dag_audit-0.1.0.tar.gz (187.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_dag_audit-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file airflow_dag_audit-0.1.0.tar.gz.

File metadata

  • Download URL: airflow_dag_audit-0.1.0.tar.gz
  • Upload date:
  • Size: 187.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_dag_audit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f2645c43a29fd8e90ede99ea0a6c92581c49964394769d564407fcc2042a485f
MD5 073b4429dc154f80638a2fb2239159d4
BLAKE2b-256 0d9d1100b562d10869b7e30bb12be311b4d266059f05889b71e212bd56c6d5a0

See more details on using hashes here.

File details

Details for the file airflow_dag_audit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: airflow_dag_audit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for airflow_dag_audit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7832bbc0367d8363f45b6fa325f93546ec9182412c03071466149c1597b922cf
MD5 a689d48679e0e84a1f49319ecc0dcb44
BLAKE2b-256 ada0de4e7d01c0ebfdcc13070ff4b1913d4b70367d41d45cba4858e26b5e0212

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page