Skip to main content

Airflow UI Plugin for monitoring DAG failures and SLA misses

Project description

Airflow Watcher ๐Ÿ‘๏ธ

An Airflow UI plugin for monitoring DAG failures and SLA misses/delays.

Demo

Airflow Watcher Demo

Features

  • ๐Ÿšจ DAG Failure Monitoring: Real-time tracking of DAG and task failures
  • โฐ SLA Miss Detection: Alerts when DAGs miss their SLA deadlines
  • ๐Ÿ“Š Dashboard View: Custom Airflow UI view for monitoring status
  • ๐Ÿ”” Multi-channel Notifications: Slack, Email, and PagerDuty alerts
  • ๐Ÿ“ˆ Trend Analysis: Historical failure and SLA miss trends
  • ๐Ÿ“ก Metrics Export: StatsD/Datadog and Prometheus support
  • โš™๏ธ Flexible Alert Rules: Pre-defined templates or custom rules

Installation

๐Ÿ“– See INSTALL.md for detailed installation and configuration instructions.

Alerting & Monitoring

๐Ÿ“– See ALERTING.md for complete alerting configuration:

  • Slack - Rich notifications with blocks
  • Email - SMTP-based alerts
  • PagerDuty - Incident management with deduplication
  • StatsD/Datadog - Real-time metrics
  • Prometheus - /metrics endpoint for scraping

Quick Setup

# Slack alerts
export AIRFLOW_WATCHER_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

# PagerDuty (optional)
export AIRFLOW_WATCHER_PAGERDUTY_ROUTING_KEY="your-key"

# Choose alert template
export AIRFLOW_WATCHER_ALERT_TEMPLATE="production_balanced"

Usage

Once installed, the plugin will automatically:

  1. Register with Airflow's plugin system
  2. Add a "Watcher" menu item to the Airflow UI
  3. Start monitoring DAG failures and SLA misses

Watcher Menu

Navigate to Watcher in the Airflow UI navigation to access:

  • Airflow Dashboard - Overview metrics
  • Airflow Health - DAG health status (success/failed/delayed/stale)
  • DAG Scheduling - Queue and pool utilization
  • DAG Failures - Recent failures with details
  • SLA Tracker - SLA misses and delays
  • Task Health - Long-running and zombie tasks
  • Dependencies - Cross-DAG dependency tracking

Architecture

+--------------------------------------------------------------+
|                   Airflow Webserver                          |
|                                                              |
|  +--------------------------------------------------------+  |
|  |              Airflow Watcher Plugin                    |  |
|  |                                                        |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  | Flask Views  |    |        Monitors (6)          |  |  |
|  |  | (Dashboard)  |<---|  - DAG Failure Monitor       |  |  |
|  |  |              |    |  - SLA Monitor               |  |  |
|  |  | REST API     |    |  - Task Health Monitor       |  |  |
|  |  | /api/watcher |    |  - Scheduling Monitor        |  |  |
|  |  +-------------+     |  - Dependency Monitor        |  |  |
|  |         |            |  - DAG Health Monitor        |  |  |
|  |         |            +----------+-------------------+  |  |
|  |         |                      |                       |  |
|  |         |           +----------v-------------------+   |  |
|  |         |           |    Metrics Collector          |  |  |
|  |         |           |    (WatcherMetrics)           |  |  |
|  |         |           +----------+-------------------+   |  |
|  |         |                      |                       |  |
|  |         v                      v                       |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  |  Notifiers   |    |        Emitters              |  |  |
|  |  |  - Slack     |    |  - StatsD / Datadog (UDP)    |  |  |
|  |  |  - Email     |    |  - Prometheus (/metrics)     |  |  |
|  |  |  - PagerDuty |    |                              |  |  |
|  |  +-------------+     +------------------------------+  |  |
|  +--------------------------------------------------------+  |
|                          |                                   |
|                          v                                   |
|              +-----------------------+                       |
|              |  Airflow Metadata DB  |                       |
|              |  (PostgreSQL/MySQL)   |                       |
|              +-----------------------+                       |
+--------------------------------------------------------------+

Everything runs inside the Airflow webserver process. No separate workers, no message queues, no external databases. The plugin reads from the same metadata DB that Airflow already maintains.

Project Structure

airflow-watcher/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ airflow_watcher/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ plugins/           # Airflow plugin definitions
โ”‚       โ”œโ”€โ”€ views/             # Flask Blueprint views
โ”‚       โ”œโ”€โ”€ monitors/          # DAG & SLA monitoring logic
โ”‚       โ”œโ”€โ”€ notifiers/         # Slack, email notifications
โ”‚       โ””โ”€โ”€ templates/         # Jinja2 templates
โ”œโ”€โ”€ demo/                      # Local demo Airflow environment
โ”‚   โ”œโ”€โ”€ dags/                  # Sample DAGs for testing
โ”‚   โ”œโ”€โ”€ plugins/               # Plugin copy for demo
โ”‚   โ””โ”€โ”€ docker-compose.yml     # Docker setup
โ”œโ”€โ”€ tests/
โ””โ”€โ”€ pyproject.toml

Demo Environment

To test the plugin locally with sample DAGs:

cd demo
docker-compose up -d

Then visit http://localhost:8080 (admin/admin) and navigate to the Watcher menu.

See demo/README.md for more details.

MWAA Integration

Setup

  1. Add airflow-watcher to your MWAA requirements.txt:
airflow-watcher==0.1.2

For Prometheus metrics support:

airflow-watcher[all]==0.1.2
  1. Upload requirements.txt to your MWAA S3 bucket:
aws s3 cp requirements.txt s3://<your-mwaa-bucket>/requirements.txt
  1. Update your MWAA environment to pick up the new requirements (via AWS Console or CLI):
aws mwaa update-environment \
  --name <your-environment-name> \
  --requirements-s3-path requirements.txt \
  --requirements-s3-object-version <version-id>

Note: No plugins.zip is needed. Airflow auto-discovers airflow-watcher via the airflow.plugins entry point when installed via pip (Airflow 2.7+).

  1. Wait for the environment to finish updating (takes a few minutes).

  2. Verify at:

https://<your-mwaa-url>/api/watcher/health

Environment Variables (optional)

Configure via MWAA Airflow configuration overrides:

Variable Purpose
AIRFLOW_WATCHER__SLACK_WEBHOOK_URL Slack notifications
AIRFLOW_WATCHER__PAGERDUTY_API_KEY PagerDuty alerts
AIRFLOW_WATCHER__ENABLE_PROMETHEUS Prometheus metrics

Testing Locally with MWAA Local Runner

git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner
echo "airflow-watcher==0.1.2" >> requirements/requirements.txt
./mwaa-local-env build-image
./mwaa-local-env start

Visit http://localhost:8080/api/watcher/health to verify.

Note: If using Slack or PagerDuty notifications, ensure your MWAA VPC has a NAT gateway for outbound internet access.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src tests
black --check src tests

# Type checking
mypy src

License

Apache License 2.0 - See LICENSE for details.

Author

Ramanujam Solaimalai (@ram07eng)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_watcher-0.1.2.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_watcher-0.1.2-py3-none-any.whl (57.1 kB view details)

Uploaded Python 3

File details

Details for the file airflow_watcher-0.1.2.tar.gz.

File metadata

  • Download URL: airflow_watcher-0.1.2.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for airflow_watcher-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c2047eb4963f98754c3e2ec5b99052edd720a86e8a8d24007b86d2dd7407ab53
MD5 9c36a3fea6ab608ab1b0904f284fa789
BLAKE2b-256 b00ee4926c64ab1329e29bd12535b3991adb5115719b1d7c4ecd05b5e6513b96

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.2.tar.gz:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airflow_watcher-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_watcher-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d3dec3c37ca32b4cbbe8ebd0edc6332d896d00dac2241c72bdfa775684e0b0c4
MD5 fca57af7720c91659c2c9679ca45e8f1
BLAKE2b-256 049412682600125a1f5dd933ee8e076cd02cce62faa094c888ed4fbad0d7c708

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page