Skip to main content

Airflow UI Plugin for monitoring DAG failures and SLA misses

Project description

Airflow Watcher ๐Ÿ‘๏ธ

An Airflow UI plugin for monitoring DAG failures and SLA misses/delays.

Demo

Airflow Watcher Demo

Features

  • ๐Ÿšจ DAG Failure Monitoring: Real-time tracking of DAG and task failures
  • โฐ SLA Miss Detection: Alerts when DAGs miss their SLA deadlines
  • ๐Ÿ“Š Dashboard View: Custom Airflow UI view for monitoring status
  • ๐Ÿ”” Multi-channel Notifications: Slack, Email, and PagerDuty alerts
  • ๐Ÿ“ˆ Trend Analysis: Historical failure and SLA miss trends
  • ๐Ÿ“ก Metrics Export: StatsD/Datadog and Prometheus support
  • โš™๏ธ Flexible Alert Rules: Pre-defined templates or custom rules

Installation

๐Ÿ“– See INSTALL.md for detailed installation and configuration instructions.

Alerting & Monitoring

๐Ÿ“– See ALERTING.md for complete alerting configuration:

  • Slack - Rich notifications with blocks
  • Email - SMTP-based alerts
  • PagerDuty - Incident management with deduplication
  • StatsD/Datadog - Real-time metrics
  • Prometheus - /metrics endpoint for scraping

Quick Setup

# Slack alerts
export AIRFLOW_WATCHER_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."

# PagerDuty (optional)
export AIRFLOW_WATCHER_PAGERDUTY_ROUTING_KEY="your-key"

# Choose alert template
export AIRFLOW_WATCHER_ALERT_TEMPLATE="production_balanced"

Usage

Once installed, the plugin will automatically:

  1. Register with Airflow's plugin system
  2. Add a "Watcher" menu item to the Airflow UI
  3. Start monitoring DAG failures and SLA misses

Watcher Menu

Navigate to Watcher in the Airflow UI navigation to access:

  • Airflow Dashboard - Overview metrics
  • Airflow Health - DAG health status (success/failed/delayed/stale)
  • DAG Scheduling - Queue and pool utilization
  • DAG Failures - Recent failures with details
  • SLA Tracker - SLA misses and delays
  • Task Health - Long-running and zombie tasks
  • Dependencies - Cross-DAG dependency tracking

Architecture

+--------------------------------------------------------------+
|                   Airflow Webserver                          |
|                                                              |
|  +--------------------------------------------------------+  |
|  |              Airflow Watcher Plugin                    |  |
|  |                                                        |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  | Flask Views  |    |        Monitors (6)          |  |  |
|  |  | (Dashboard)  |<---|  - DAG Failure Monitor       |  |  |
|  |  |              |    |  - SLA Monitor               |  |  |
|  |  | REST API     |    |  - Task Health Monitor       |  |  |
|  |  | /api/watcher |    |  - Scheduling Monitor        |  |  |
|  |  +-------------+     |  - Dependency Monitor        |  |  |
|  |         |            |  - DAG Health Monitor        |  |  |
|  |         |            +----------+-------------------+  |  |
|  |         |                      |                       |  |
|  |         |           +----------v-------------------+   |  |
|  |         |           |    Metrics Collector          |  |  |
|  |         |           |    (WatcherMetrics)           |  |  |
|  |         |           +----------+-------------------+   |  |
|  |         |                      |                       |  |
|  |         v                      v                       |  |
|  |  +-------------+     +------------------------------+  |  |
|  |  |  Notifiers   |    |        Emitters              |  |  |
|  |  |  - Slack     |    |  - StatsD / Datadog (UDP)    |  |  |
|  |  |  - Email     |    |  - Prometheus (/metrics)     |  |  |
|  |  |  - PagerDuty |    |                              |  |  |
|  |  +-------------+     +------------------------------+  |  |
|  +--------------------------------------------------------+  |
|                          |                                   |
|                          v                                   |
|              +-----------------------+                       |
|              |  Airflow Metadata DB  |                       |
|              |  (PostgreSQL/MySQL)   |                       |
|              +-----------------------+                       |
+--------------------------------------------------------------+

Everything runs inside the Airflow webserver process. No separate workers, no message queues, no external databases. The plugin reads from the same metadata DB that Airflow already maintains.

Project Structure

airflow-watcher/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ airflow_watcher/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ plugins/           # Airflow plugin definitions
โ”‚       โ”œโ”€โ”€ views/             # Flask Blueprint views
โ”‚       โ”œโ”€โ”€ monitors/          # DAG & SLA monitoring logic
โ”‚       โ”œโ”€โ”€ notifiers/         # Slack, email notifications
โ”‚       โ””โ”€โ”€ templates/         # Jinja2 templates
โ”œโ”€โ”€ demo/                      # Local demo Airflow environment
โ”‚   โ”œโ”€โ”€ dags/                  # Sample DAGs for testing
โ”‚   โ”œโ”€โ”€ plugins/               # Plugin copy for demo
โ”‚   โ””โ”€โ”€ docker-compose.yml     # Docker setup
โ”œโ”€โ”€ tests/
โ””โ”€โ”€ pyproject.toml

Demo Environment

To test the plugin locally with sample DAGs:

cd demo
docker-compose up -d

Then visit http://localhost:8080 (admin/admin) and navigate to the Watcher menu.

See demo/README.md for more details.

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check src tests
black --check src tests

# Type checking
mypy src

License

Apache License 2.0 - See LICENSE for details.

Author

Ramanujam Solaimalai (@ram07eng)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_watcher-0.1.1.tar.gz (50.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airflow_watcher-0.1.1-py3-none-any.whl (56.5 kB view details)

Uploaded Python 3

File details

Details for the file airflow_watcher-0.1.1.tar.gz.

File metadata

  • Download URL: airflow_watcher-0.1.1.tar.gz
  • Upload date:
  • Size: 50.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for airflow_watcher-0.1.1.tar.gz
Algorithm Hash digest
SHA256 642941b17c7fc9d1e78f3d90e2605887e5d5dc766ae1869e4bf461c3318e4aac
MD5 5f6fd81ee7ecc597a6875a6c567e6543
BLAKE2b-256 74c78234e4ee622f90f76da7eb1e0808e4759d5cb5cc93d162876949422232ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.1.tar.gz:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file airflow_watcher-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for airflow_watcher-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ff35fc389f664831649d73e77ed8f8a15adee51af17bdf5d89473aa96922a07e
MD5 225b83d3a818b6db75eb4abf99e9d841
BLAKE2b-256 e9c73b2c80eb3138f3151559e2fcc3ad0493c8cc16c33643cd5b79522950855d

See more details on using hashes here.

Provenance

The following attestation bundles were made for airflow_watcher-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ram07eng/airflow-watcher

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page