Skip to main content

Airflow UI Plugin for monitoring DAG failures and SLA misses

Project description

Airflow Watcher ๐Ÿ‘๏ธ

An Airflow UI plugin for monitoring DAG failures and SLA misses/delays.

Demo

Airflow Watcher Demo

Features

  • ๐Ÿšจ DAG Failure Monitoring: Real-time tracking of DAG and task failures
  • โฐ SLA Miss Detection: Alerts when DAGs miss their SLA deadlines
  • ๐Ÿ“Š Dashboard View: Custom Airflow UI view for monitoring status
  • ๐Ÿ”” Multi-channel Notifications: Slack, Email, and PagerDuty alerts
  • ๐Ÿ“ˆ Trend Analysis: Historical failure and SLA miss trends
  • ๐Ÿ“ก Metrics Export: StatsD/Datadog and Prometheus support
  • โš™๏ธ Flexible Alert Rules: Pre-defined templates or custom rules

Installation

๐Ÿ“– See INSTALL.md for detailed installation and configuration instructions.

Alerting & Monitoring

๐Ÿ“– See ALERTING.md for complete alerting configuration:

  • Slack - Rich notifications with blocks
  • Email - SMTP-based alerts
  • PagerDuty - Incident management with deduplication
  • StatsD/Datadog - Real-time metrics
  • Prometheus - /metrics endpoint for scraping

Quick Setup

# Slack alerts
export AIRFLOW_WATCHER_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

# PagerDuty (optional)
export AIRFLOW_WATCHER_PAGERDUTY_ROUTING_KEY="your-key"

# Choose alert template
export AIRFLOW_WATCHER_ALERT_TEMPLATE="production_balanced"

Usage

Once installed, the plugin will automatically:

  1. Register with Airflow's plugin system
  2. Add a "Watcher" menu item to the Airflow UI
  3. Start monitoring DAG failures and SLA misses

Watcher Menu

Navigate to Watcher in the Airflow UI navigation to access:

  • Airflow Dashboard - Overview metrics
  • Airflow Health - DAG health status (success/failed/delayed/stale)
  • DAG Scheduling - Queue and pool utilization
  • DAG Failures - Recent failures with details
  • SLA Tracker - SLA misses and delays
  • Task Health - Long-running and zombie tasks
  • Dependencies - Cross-DAG dependency tracking

Role-Based Access Control (RBAC)

Airflow Watcher integrates with Airflow's built-in FAB security manager to enforce DAG-level access control. No separate configuration is needed โ€” it reads directly from Airflow's role and permission system.

How It Works

  • Admin / Op roles see all DAGs across every Watcher page and API endpoint
  • Custom roles only see DAGs they have can_read permission on
  • Filtering is mandatory and applied server-side โ€” restricted users cannot bypass it
  • Aggregate stats (failure counts, SLA misses, health scores) are recomputed per-user so no global data leaks
  • A ๐Ÿ”’ badge appears in the filter bar for non-admin users

Setting Up DAG-Level Permissions

Add access_control to your DAG definitions to grant team-specific access:

from airflow import DAG

dag = DAG(
    dag_id="weather_data_pipeline",
    schedule_interval="@hourly",
    access_control={
        "team_weather": {"can_read", "can_edit"},
    },
)

Then create matching roles in Airflow (Admin โ†’ Security โ†’ List Roles) and assign users to them. The Watcher plugin will automatically pick up the permissions.

What Gets Filtered

Area Filtering
Dashboard stats Failure count, SLA misses, health score โ€” all scoped to user's DAGs
Failures page Only failures from accessible DAGs
SLA page Only SLA misses from accessible DAGs
Health page Health status, stale DAGs, scheduling lag โ€” filtered
Task health Long-running tasks, zombies, retries โ€” filtered
Scheduling Concurrent runs, delayed DAGs โ€” filtered
Dependencies Cross-DAG deps, correlations โ€” filtered
All API endpoints Same RBAC enforcement as UI pages

Demo Users

The demo environment includes pre-configured RBAC users:

User Role Visible DAGs
admin Admin All 8 DAGs
weather_user team_weather weather_data_pipeline, stock_market_collector
ecommerce_user team_ecommerce ecommerce_sales_etl, data_quality_checks

Passwords are configured in demo/docker-compose.yml. Change them before any shared deployment.

RBAC Demo

Admin user โ€” sees all DAGs and full aggregate stats:

Admin RBAC Demo

Weather team user โ€” only sees weather_data_pipeline and stock_market_collector:

Weather User RBAC Demo

Ecommerce team user โ€” only sees ecommerce_sales_etl and data_quality_checks:

Ecommerce User RBAC Demo

cd demo
docker-compose up -d
# Visit http://localhost:8080 and login as any user above

Architecture

+--------------------------------------------------------------+
|                   Airflow Webserver                          |
|                                                              |
|  +--------------------------------------------------------+  |
|  |              Airflow Watcher Plugin                    |  |
|  |                                                        |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  | Flask Views  |    |        Monitors (6)          |  |  |
|  |  | (Dashboard)  |<---|  - DAG Failure Monitor       |  |  |
|  |  |              |    |  - SLA Monitor               |  |  |
|  |  | REST API     |    |  - Task Health Monitor       |  |  |
|  |  | /api/watcher |    |  - Scheduling Monitor        |  |  |
|  |  +-------------+     |  - Dependency Monitor        |  |  |
|  |         |            |  - DAG Health Monitor        |  |  |
|  |         |            +----------+-------------------+  |  |
|  |         |                      |                       |  |
|  |         |           +----------v-------------------+   |  |
|  |         |           |    Metrics Collector          |  |  |
|  |         |           |    (WatcherMetrics)           |  |  |
|  |         |           +----------+-------------------+   |  |
|  |         |                      |                       |  |
|  |         v                      v                       |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  |  Notifiers   |    |        Emitters              |  |  |
|  |  |  - Slack     |    |  - StatsD / Datadog (UDP)    |  |  |
|  |  |  - Email     |    |  - Prometheus (/metrics)     |  |  |
|  |  |  - PagerDuty |    |                              |  |  |
|  |  +-------------+     +------------------------------+  |  |
|  +--------------------------------------------------------+  |
|                          |                                   |
|                          v                                   |
|              +-----------------------+                       |
|              |  Airflow Metadata DB  |                       |
|              |  (PostgreSQL/MySQL)   |                       |
|              +-----------------------+                       |
+--------------------------------------------------------------+

Everything runs inside the Airflow webserver process. No separate workers, no message queues, no external databases. The plugin reads from the same metadata DB that Airflow already maintains.

Project Structure

airflow-watcher/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ airflow_watcher/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ plugins/           # Airflow plugin definitions
โ”‚       โ”œโ”€โ”€ views/             # Flask Blueprint views
โ”‚       โ”œโ”€โ”€ monitors/          # DAG & SLA monitoring logic
โ”‚       โ”œโ”€โ”€ notifiers/         # Slack, email notifications
โ”‚       โ””โ”€โ”€ templates/         # Jinja2 templates
โ”œโ”€โ”€ demo/                      # Local demo Airflow environment
โ”‚   โ”œโ”€โ”€ dags/                  # Sample DAGs for testing
โ”‚   โ”œโ”€โ”€ plugins/               # Plugin copy for demo
โ”‚   โ””โ”€โ”€ docker-compose.yml     # Docker setup
โ”œโ”€โ”€ tests/
โ””โ”€โ”€ pyproject.toml

Demo Environment

To test the plugin locally with sample DAGs:

cd demo
docker-compose up -d

Then visit http://localhost:8080 and navigate to the Watcher menu.

See demo/README.md for more details.

MWAA Integration

Setup

  1. Add airflow-watcher to your MWAA requirements.txt:
airflow-watcher==0.1.2

For Prometheus metrics support:

airflow-watcher[all]==0.1.2
  1. Upload requirements.txt to your MWAA S3 bucket:
aws s3 cp requirements.txt s3://<your-mwaa-bucket>/requirements.txt
  1. Update your MWAA environment to pick up the new requirements (via AWS Console or CLI):
aws mwaa update-environment \
  --name <your-environment-name> \
  --requirements-s3-path requirements.txt \
  --requirements-s3-object-version <version-id>

Note: No plugins.zip is needed. Airflow auto-discovers airflow-watcher via the airflow.plugins entry point when installed via pip (Airflow 2.7+).

  1. Wait for the environment to finish updating (takes a few minutes).

  2. Verify at:

https://<your-mwaa-url>/api/watcher/health

Environment Variables (optional)

Configure via MWAA Airflow configuration overrides:

Variable Purpose
AIRFLOW_WATCHER__SLACK_WEBHOOK_URL Slack notifications
AIRFLOW_WATCHER__PAGERDUTY_API_KEY PagerDuty alerts
AIRFLOW_WATCHER__ENABLE_PROMETHEUS Prometheus metrics

Testing Locally with MWAA Local Runner

git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner
echo "airflow-watcher==0.1.2" >> requirements/requirements.txt
./mwaa-local-env build-image
./mwaa-local-env start

Visit http://localhost:8080/api/watcher/health to verify.

Note: If using Slack or PagerDuty notifications, ensure your MWAA VPC has a NAT gateway for outbound internet access.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src tests
black --check src tests

# Type checking
mypy src

License

Apache License 2.0 - See LICENSE for details.

Author

Ramanujam Solaimalai (@ram07eng)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_watcher-0.1.4.tar.gz (57.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_watcher-0.1.4-py3-none-any.whl (62.8 kB view details)

Uploaded Python 3

File details

Details for the file airflow_watcher-0.1.4.tar.gz.

File metadata

  • Download URL: airflow_watcher-0.1.4.tar.gz
  • Upload date:
  • Size: 57.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for airflow_watcher-0.1.4.tar.gz
Algorithm Hash digest
SHA256 21129ed64b428ef9db7f90614c04c8c8a138a1bb8cdd5e862bf4da7cc3e71deb
MD5 c8a2805bb4eb1ac55ca96a1c9fc41c9c
BLAKE2b-256 0d2cbefe60b68dc2ac336abc3a082660b178388f871a2b37e764ec7c9a7bb61c

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.4.tar.gz:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airflow_watcher-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_watcher-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 069af54f4bcf1e15fe49301fe111e5f6fa4a0f64e92963e30f58ab4301ddca7b
MD5 c46e7baac877956085a45b2fa305842c
BLAKE2b-256 f92c215d2ad6c7dbad8108b76e032f51de24a278b84504c06f602c680e66859c

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.4-py3-none-any.whl:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page