Airflow UI Plugin for monitoring DAG failures and SLA misses
Project description
Airflow Watcher ๐๏ธ
An Airflow UI plugin for monitoring DAG failures and SLA misses/delays.
Demo
Features
- ๐จ DAG Failure Monitoring: Real-time tracking of DAG and task failures
- โฐ SLA Miss Detection: Alerts when DAGs miss their SLA deadlines
- ๐ Dashboard View: Custom Airflow UI view for monitoring status
- ๐ Multi-channel Notifications: Slack, Email, and PagerDuty alerts
- ๐ Trend Analysis: Historical failure and SLA miss trends
- ๐ก Metrics Export: StatsD/Datadog and Prometheus support
- โ๏ธ Flexible Alert Rules: Pre-defined templates or custom rules
Installation
๐ See INSTALL.md for detailed installation and configuration instructions.
Alerting & Monitoring
๐ See ALERTING.md for complete alerting configuration:
- Slack - Rich notifications with blocks
- Email - SMTP-based alerts
- PagerDuty - Incident management with deduplication
- StatsD/Datadog - Real-time metrics
- Prometheus -
/metricsendpoint for scraping
Quick Setup
# Slack alerts
export AIRFLOW_WATCHER_SLACK_WEBHOOK_URL="https://hooks.slack.com/..."
# PagerDuty (optional)
export AIRFLOW_WATCHER_PAGERDUTY_ROUTING_KEY="your-key"
# Choose alert template
export AIRFLOW_WATCHER_ALERT_TEMPLATE="production_balanced"
Usage
Once installed, the plugin will automatically:
- Register with Airflow's plugin system
- Add a "Watcher" menu item to the Airflow UI
- Start monitoring DAG failures and SLA misses
Watcher Menu
Navigate to Watcher in the Airflow UI navigation to access:
- Airflow Dashboard - Overview metrics
- Airflow Health - DAG health status (success/failed/delayed/stale)
- DAG Scheduling - Queue and pool utilization
- DAG Failures - Recent failures with details
- SLA Tracker - SLA misses and delays
- Task Health - Long-running and zombie tasks
- Dependencies - Cross-DAG dependency tracking
Architecture
+--------------------------------------------------------------+
| Airflow Webserver |
| |
| +--------------------------------------------------------+ |
| | Airflow Watcher Plugin | |
| | | |
| | +-------------+ +------------------------------+ | |
| | | Flask Views | | Monitors (6) | | |
| | | (Dashboard) |<---| - DAG Failure Monitor | | |
| | | | | - SLA Monitor | | |
| | | REST API | | - Task Health Monitor | | |
| | | /api/watcher | | - Scheduling Monitor | | |
| | +-------------+ | - Dependency Monitor | | |
| | | | - DAG Health Monitor | | |
| | | +----------+-------------------+ | |
| | | | | |
| | | +----------v-------------------+ | |
| | | | Metrics Collector | | |
| | | | (WatcherMetrics) | | |
| | | +----------+-------------------+ | |
| | | | | |
| | v v | |
| | +-------------+ +------------------------------+ | |
| | | Notifiers | | Emitters | | |
| | | - Slack | | - StatsD / Datadog (UDP) | | |
| | | - Email | | - Prometheus (/metrics) | | |
| | | - PagerDuty | | | | |
| | +-------------+ +------------------------------+ | |
| +--------------------------------------------------------+ |
| | |
| v |
| +-----------------------+ |
| | Airflow Metadata DB | |
| | (PostgreSQL/MySQL) | |
| +-----------------------+ |
+--------------------------------------------------------------+
Everything runs inside the Airflow webserver process. No separate workers, no message queues, no external databases. The plugin reads from the same metadata DB that Airflow already maintains.
Project Structure
airflow-watcher/
โโโ src/
โ โโโ airflow_watcher/
โ โโโ __init__.py
โ โโโ plugins/ # Airflow plugin definitions
โ โโโ views/ # Flask Blueprint views
โ โโโ monitors/ # DAG & SLA monitoring logic
โ โโโ notifiers/ # Slack, email notifications
โ โโโ templates/ # Jinja2 templates
โโโ demo/ # Local demo Airflow environment
โ โโโ dags/ # Sample DAGs for testing
โ โโโ plugins/ # Plugin copy for demo
โ โโโ docker-compose.yml # Docker setup
โโโ tests/
โโโ pyproject.toml
Demo Environment
To test the plugin locally with sample DAGs:
cd demo
docker-compose up -d
Then visit http://localhost:8080 (admin/admin) and navigate to the Watcher menu.
See demo/README.md for more details.
MWAA Integration
Setup
- Add
airflow-watcherto your MWAArequirements.txt:
airflow-watcher==0.1.2
For Prometheus metrics support:
airflow-watcher[all]==0.1.2
- Upload
requirements.txtto your MWAA S3 bucket:
aws s3 cp requirements.txt s3://<your-mwaa-bucket>/requirements.txt
- Update your MWAA environment to pick up the new requirements (via AWS Console or CLI):
aws mwaa update-environment \
--name <your-environment-name> \
--requirements-s3-path requirements.txt \
--requirements-s3-object-version <version-id>
Note: No
plugins.zipis needed. Airflow auto-discovers airflow-watcher via theairflow.pluginsentry point when installed via pip (Airflow 2.7+).
-
Wait for the environment to finish updating (takes a few minutes).
-
Verify at:
https://<your-mwaa-url>/api/watcher/health
Environment Variables (optional)
Configure via MWAA Airflow configuration overrides:
| Variable | Purpose |
|---|---|
AIRFLOW_WATCHER__SLACK_WEBHOOK_URL |
Slack notifications |
AIRFLOW_WATCHER__PAGERDUTY_API_KEY |
PagerDuty alerts |
AIRFLOW_WATCHER__ENABLE_PROMETHEUS |
Prometheus metrics |
Testing Locally with MWAA Local Runner
git clone https://github.com/aws/aws-mwaa-local-runner.git
cd aws-mwaa-local-runner
echo "airflow-watcher==0.1.2" >> requirements/requirements.txt
./mwaa-local-env build-image
./mwaa-local-env start
Visit http://localhost:8080/api/watcher/health to verify.
Note: If using Slack or PagerDuty notifications, ensure your MWAA VPC has a NAT gateway for outbound internet access.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check src tests
black --check src tests
# Type checking
mypy src
License
Apache License 2.0 - See LICENSE for details.
Author
Ramanujam Solaimalai (@ram07eng)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file airflow_watcher-0.1.2.tar.gz.
File metadata
- Download URL: airflow_watcher-0.1.2.tar.gz
- Upload date:
- Size: 51.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2047eb4963f98754c3e2ec5b99052edd720a86e8a8d24007b86d2dd7407ab53
|
|
| MD5 |
9c36a3fea6ab608ab1b0904f284fa789
|
|
| BLAKE2b-256 |
b00ee4926c64ab1329e29bd12535b3991adb5115719b1d7c4ecd05b5e6513b96
|
Provenance
The following attestation bundles were made for airflow_watcher-0.1.2.tar.gz:
Publisher:
publish.yml on ram07eng/airflow-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
airflow_watcher-0.1.2.tar.gz -
Subject digest:
c2047eb4963f98754c3e2ec5b99052edd720a86e8a8d24007b86d2dd7407ab53 - Sigstore transparency entry: 983673547
- Sigstore integration time:
-
Permalink:
ram07eng/airflow-watcher@75d3f88b80898a3f6b03040be1c957213ecfb58c -
Branch / Tag:
refs/tags/v.0.1.2 - Owner: https://github.com/ram07eng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@75d3f88b80898a3f6b03040be1c957213ecfb58c -
Trigger Event:
release
-
Statement type:
File details
Details for the file airflow_watcher-0.1.2-py3-none-any.whl.
File metadata
- Download URL: airflow_watcher-0.1.2-py3-none-any.whl
- Upload date:
- Size: 57.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3dec3c37ca32b4cbbe8ebd0edc6332d896d00dac2241c72bdfa775684e0b0c4
|
|
| MD5 |
fca57af7720c91659c2c9679ca45e8f1
|
|
| BLAKE2b-256 |
049412682600125a1f5dd933ee8e076cd02cce62faa094c888ed4fbad0d7c708
|
Provenance
The following attestation bundles were made for airflow_watcher-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on ram07eng/airflow-watcher
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
airflow_watcher-0.1.2-py3-none-any.whl -
Subject digest:
d3dec3c37ca32b4cbbe8ebd0edc6332d896d00dac2241c72bdfa775684e0b0c4 - Sigstore transparency entry: 983673551
- Sigstore integration time:
-
Permalink:
ram07eng/airflow-watcher@75d3f88b80898a3f6b03040be1c957213ecfb58c -
Branch / Tag:
refs/tags/v.0.1.2 - Owner: https://github.com/ram07eng
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@75d3f88b80898a3f6b03040be1c957213ecfb58c -
Trigger Event:
release
-
Statement type: