An airflow provider for anomaly detection.
Project description
Anomaly Detection with Apache Airflow
Painless anomaly detection (using PyOD) with Apache Airflow via this community Airflow Provider package.
How it works in a nutshell:
- Create and express your metrics via SQL queries.
- Some YAML configuration fun.
- Receive useful alerts when metrics look anomalous.
Example Alert
Example output of an alert. Horizontal bar chart used to show metric values over time.
Smoothed anomaly score is shown as a %
and any flagged anomalies are marked with *
.
Alert Text (ascii art yay!)
🔥 [some_metric_last1h] looks anomalous (2023-01-25 16:00:00) 🔥
some_metric_last1h (2023-01-24 15:30:00 to 2023-01-25 16:00:00)
t=0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,742.00 72% 2023-01-25 16:00:00
t=-1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3,165.00 * 81% 2023-01-25 15:30:00
t=-2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3,448.00 * 95% 2023-01-25 15:15:00
t=-3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3,441.00 76% 2023-01-25 15:00:00
t=-4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,475.00 72% 2023-01-25 14:30:00
t=-5 ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1,833.00 72% 2023-01-25 14:15:00
t=-6 ~~~~~~~~~~~~~~~~~~~~ 1,406.00 72% 2023-01-25 14:00:00
t=-7 ~~~~~~~~~~~~~~~~~~~ 1,327.00 * 89% 2023-01-25 13:30:00
t=-8 ~~~~~~~~~~~~~~~~~~~ 1,363.00 78% 2023-01-25 13:15:00
t=-9 ~~~~~~~~~~~~~~~~~~~~~~~~ 1,656.00 66% 2023-01-25 13:00:00
t=-10 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,133.00 51% 2023-01-25 12:30:00
t=-11 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,392.00 40% 2023-01-25 12:15:00
t=-12 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,509.00 41% 2023-01-25 12:00:00
t=-13 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,729.00 42% 2023-01-25 11:30:00
t=-14 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,696.00 44% 2023-01-25 11:15:00
t=-15 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,618.00 41% 2023-01-25 11:00:00
t=-16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,390.00 39% 2023-01-25 10:30:00
t=-17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,601.00 27% 2023-01-24 20:00:00
t=-18 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,833.00 25% 2023-01-24 17:30:00
t=-19 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,910.00 28% 2023-01-24 17:15:00
t=-20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,757.00 22% 2023-01-24 17:00:00
t=-21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,696.00 34% 2023-01-24 16:30:00
t=-22 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,651.00 37% 2023-01-24 16:15:00
t=-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,797.00 39% 2023-01-24 16:00:00
t=-24 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2,739.00 40% 2023-01-24 15:30:00
Below is the sql to pull the metric in question for investigation (this is included in the alert for convenience).
select *
from `metrics.metrics` m
join `metrics.metrics_scored` s
on m.metric_name = s.metric_name and m.metric_timestamp = s.metric_timestamp
where m.metric_name = 'some_metric_last1h'
order by m.metric_timestamp desc
Alert Chart
A slightly more fancy chart is also attached to alert emails. The top line graph shows the metric values over time. The bottom line graph shows the smoothed anomaly score over time along with the alert status for any flagged anomalies where the smoothed anomaly score passes the threshold.
Getting Started
Check out the example dag to get started.
Prerequisites
- Currently only Google BiqQuery is supported as a data source. The plan is to add Snowflake next and then probably Redshift. PR's to add other data sources are very welcome (some refactoring probably needed).
- Requirements are listed in requirements.txt.
Installation
Install from PyPI as usual.
pip install airflow-provider-anomaly-detection
Configuration
See the example configuration files in the example dag folder. You can use a defaults.yaml
or specific <metric-batch>.yaml
for each metric batch if needed.
Docker
YOu can use the docker compose file to spin up an airflow instance with the provider installed and the example dag available. This is useful for quickly trying it out locally.
docker-compose up
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for airflow_provider_anomaly_detection-0.0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37592b1e03160c761eaebafcd0bef8ae0d15a22356ffa34edac0506b4d5f3769 |
|
MD5 | 885a4732152e89d1fb02df5a72fa7f79 |
|
BLAKE2b-256 | f3adc100178b2057ddd0ec50f01916cf988343c70719ec8a4b91a3b6c8848bc1 |
Hashes for airflow_provider_anomaly_detection-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc1548850451b11dd0672af9b7a477bed20131b00f5073793f3cbdbf0e746a8b |
|
MD5 | a9b5a313c37decaffbe3f63a3cdb85e7 |
|
BLAKE2b-256 | 5ea14f3df438e5812068dd66f98bae6df4a01d6abec4f4df9609b1240853f85b |