Skip to main content

An airflow provider for anomaly detection.

Project description

Anomaly Detection with Apache Airflow

Painless anomaly detection (using PyOD) with Apache Airflow via this community Airflow Provider package.

How it works in a nutshell:

  1. Create and express your metrics via SQL queries.
  2. Some YAML configuration fun.
  3. Receive useful alerts when metrics look anomalous.

Example Alert

Example output of an alert. Horizontal bar chart used to show metric values over time. Smoothed anomaly score is shown as a % and any flagged anomalies are marked with *.

Alert Text (ascii art yay!)

🔥 [some_metric_last1h] looks anomalous (2023-01-25 16:00:00) 🔥
some_metric_last1h (2023-01-24 15:30:00 to 2023-01-25 16:00:00)
                                                                                       
t=0   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,742.00    72% 2023-01-25 16:00:00
t=-1  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~       3,165.00  * 81% 2023-01-25 15:30:00
t=-2  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  3,448.00  * 95% 2023-01-25 15:15:00
t=-3  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   3,441.00    76% 2023-01-25 15:00:00
t=-4  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                 2,475.00    72% 2023-01-25 14:30:00
t=-5  ~~~~~~~~~~~~~~~~~~~~~~~~~~                          1,833.00    72% 2023-01-25 14:15:00
t=-6  ~~~~~~~~~~~~~~~~~~~~                                1,406.00    72% 2023-01-25 14:00:00
t=-7  ~~~~~~~~~~~~~~~~~~~                                 1,327.00  * 89% 2023-01-25 13:30:00
t=-8  ~~~~~~~~~~~~~~~~~~~                                 1,363.00    78% 2023-01-25 13:15:00
t=-9  ~~~~~~~~~~~~~~~~~~~~~~~~                            1,656.00    66% 2023-01-25 13:00:00
t=-10 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                      2,133.00    51% 2023-01-25 12:30:00
t=-11 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                  2,392.00    40% 2023-01-25 12:15:00
t=-12 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                2,509.00    41% 2023-01-25 12:00:00
t=-13 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,729.00    42% 2023-01-25 11:30:00
t=-14 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,696.00    44% 2023-01-25 11:15:00
t=-15 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~               2,618.00    41% 2023-01-25 11:00:00
t=-16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                  2,390.00    39% 2023-01-25 10:30:00
t=-17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~               2,601.00    27% 2023-01-24 20:00:00
t=-18 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~           2,833.00    25% 2023-01-24 17:30:00
t=-19 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~          2,910.00    28% 2023-01-24 17:15:00
t=-20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,757.00    22% 2023-01-24 17:00:00
t=-21 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,696.00    34% 2023-01-24 16:30:00
t=-22 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~              2,651.00    37% 2023-01-24 16:15:00
t=-23 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~            2,797.00    39% 2023-01-24 16:00:00
t=-24 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             2,739.00    40% 2023-01-24 15:30:00

Below is the sql to pull the metric in question for investigation (this is included in the alert for convenience).

select *
from `metrics.metrics` m
join `metrics.metrics_scored` s
on m.metric_name = s.metric_name and m.metric_timestamp = s.metric_timestamp
where m.metric_name = 'some_metric_last1h'
order by m.metric_timestamp desc

Alert Chart

A slightly more fancy chart is also attached to alert emails. The top line graph shows the metric values over time. The bottom line graph shows the smoothed anomaly score over time along with the alert status for any flagged anomalies where the smoothed anomaly score passes the threshold.

alert-chart-example

Getting Started

Check out the example dag to get started.

Prerequisites

  • Currently only Google BiqQuery is supported as a data source. The plan is to add Snowflake next and then probably Redshift. PR's to add other data sources are very welcome (some refactoring probably needed).
  • Requirements are listed in requirements.txt.

Installation

Install from PyPI as usual.

pip install airflow-provider-anomaly-detection

Configuration

See the example configuration files in the example dag folder. You can use a defaults.yaml or specific <metric-batch>.yaml for each metric batch if needed.

Docker

YOu can use the docker compose file to spin up an airflow instance with the provider installed and the example dag available. This is useful for quickly trying it out locally.

docker-compose up

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airflow_provider_anomaly_detection-0.0.15.tar.gz (243.2 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page