Skip to main content

A simple-to-use Python package for the development and analysis of time series anomaly detection techniques.

Project description

Time Series Anomaly Detection

pipeline status coverage report

IMPORTANT: dtaianomaly is still a work in progress. Therefore, many changes are still expected. Feel free to contact us if there are any suggestions!

A simple-to-use Python package for the development and analysis of time series anomaly detection techniques.

Table of Contents

  1. Installation: How to install dtaianomaly.
  2. Usage: How to use dtaianomaly, both in your own code and through configuration files
  3. More examples A list of more in-depth examples.
  4. Contact: How to get in touch with us.

Installation

You can install dtaianomaly using pip:

pip install dtaianomaly

Usage

In code

Here we show how you can use dtaianomaly in your own code. We first show how to load datasets using the DataManager. If you already have a time series as a np.ndarray of size (n_samples, n_features), you can skip this step. Second, we show how to use the TimeSeriesAnomalyDetector class to detect anomalies in the data. Third, we show how to quantitatively evaluate the results of the anomaly detection algorithm. Because time series are inherently something visual, we also show how to use dtaianomaly to visualize the results of the anomaly detection algorithm. This jupyter notebook contains all the code cells shown below.

1. Loading data

Data can be read using the DataManager class. Below we give a simple example of loading data using the DataManager class. More information regarding how to structure the datasets and how to select datasets with certain properties can be found in the data folder.

The reasoning of DataManager is inspired by TimeEval.

from dtaianomaly.data_management import DataManager

# Initialize the data manager
data_manager = DataManager(data_dir='data', datasets_index_file='datasets.csv')

# Select all datasets
data_manager.select({'collection_name': 'Demo', 'dataset_name': 'Demo1'}) 
# Get the index of the first selected dataset
dataset_index = data_manager.get(0)  
# Load the trend data (as a numpy ndarray) and the anomaly labels
trend_data, labels = data_manager.load_raw_data(dataset_index, train=False)

2. Detecting anomalies

The TimeSeriesAnomalyDetector class is the main class of dtaianomaly as it is the base of all time series anomaly detection algorithms. The main methods of this class are:

  1. fit(trend_data: np.ndarray, labels: np.array = None) to fit the anomaly detector. The labels parameter is optional and should only be given to supervised time series anomaly detection algorithms.
  2. decision_function(trend_data: np.ndarray) to compute the raw anomaly scores of every measurement the time series. The scores are a value in the range $[0, +\infty[$, in which a absolute value of the anomaly score indicates how anomalous an observation is.
  3. predict_proba(trend_data: np.ndarray, normalization: str = 'unify') converts the raw anomaly scores to a probability of an observation being anomalous (thus in range $[0, 1]$). The normalization parameter indicates how the raw anomaly scores should be normalized.

Here we show a simple example to detect anomalies in time series. Specifically, we use an IForest (as implemented in PyOD), but adapted for time series using a sliding widow of size 16.

from dtaianomaly.anomaly_detection import PyODAnomalyDetector, Windowing

# Initialize the anomaly detector
# Here we use an IForest with a sliding window of size 16
anomaly_detector = PyODAnomalyDetector('IForest', Windowing(window_size=100))

# Fit the anomaly detector 
anomaly_detector.fit(trend_data)
# Compute the raw anomaly scores of an observation (in range [0, infinity])
raw_anomaly_scores = anomaly_detector.decision_function(trend_data)
# Compute the probability of an observation being an anomaly (in range [0, 1])
anomaly_probabilities = anomaly_detector.predict_proba(trend_data)

In this example, anomaly_detector can be any of the implemented anomaly detection algorithms. This allows for abstraction using the TimeSeriesAnomalyDetector class, which can be used to implement pre- and post-processing steps for anomaly detection algorithms.

3. Evaluating results

The evaluation module contains functions to evaluate the results of the anomaly detector, as shown below. Some methods use continuous anomaly scores (such as the area under the precision-recall curve), while others require discrete anomaly labels (such as the F1 score). Therefore, we provide several thresholding methods, such as fixed_value_threshold.

from dtaianomaly.evaluation import f1, pr_auc, fixed_value_threshold

# Compute the F1 score, for which discrete anomaly labels are required
predicted_anomaly_labels = fixed_value_threshold(labels, raw_anomaly_scores)
f1_score = f1(labels, predicted_anomaly_labels)

# Compute the area under the precision-recall curve
pr_auc_score = pr_auc(labels, raw_anomaly_scores)

4. Visualizing the results

To easily visualize the results of the anomaly detection algorithm (beyond numerical results), we provide methods to visualize the data and the anomaly scores. A simple example is shown below.

from dtaianomaly.visualization import plot_anomaly_scores

# Load the trend data as a pandas DataFrame
trend_data_df = data_manager.load(dataset_index, train=False)
plot_anomaly_scores(trend_data_df, raw_anomaly_scores)

Anomaly scores

Using configuration files

One of the best ways to guarantee reproducibility of your experiments is to use configuration files. Therefore, we implemented a simple way to provide configuration files for the evaluation of time series anomaly detection algorithms. The configurations are formatted in json format, but can also be passed directly as a dictionary. Below we show how to use a configuration files to execute an algorithm. Checkout the experiments folder for more information regarding the format of the configuration files and examples.

from dtaianomaly.workflows import execute_algorithm

results = execute_algorithm(
   data_manager=data_manager,
   # Give configurations as the location of the configuration file ...
   data_configuration='experiments/default_configurations/data/Demo.json',
   algorithm_configuration='experiments/default_configurations/algorithm/iforest.json',
   # ... or directly as a dictionary
   metric_configuration={
     "pr_auc": { },
     "precision": {
       "thresholding_strategy": "fixed_value_threshold",
       "thresholding_parameters": {
         "threshold": 0.05
       }
     }
   }
)

More examples

More examples will be added in the notebooks directory soon!

  • PyOD anomaly detectors: Compares different anomaly detection algorithms implemented in the PyOD library on a simple time series, showing how to easily initialize a PyODAnomalyDetector and compare multiple methods.
  • Compare normalization: Compares different normalization methods for anomaly scores, showing how to easily compare multiple methods.
  • Analyze decision scores: Vizually illustrates the decision scores of various anomlay detectors.

Contact

Feel free to email louis.carpentier@kuleuven.be if there are any questions, remarks, ideas, ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtaianomaly-0.0.5.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

dtaianomaly-0.0.5-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file dtaianomaly-0.0.5.tar.gz.

File metadata

  • Download URL: dtaianomaly-0.0.5.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for dtaianomaly-0.0.5.tar.gz
Algorithm Hash digest
SHA256 40b887fd9f4e1c073017f36640de59660804a602110c18ae2cd5d15fe1d59d18
MD5 f9e855540d77fba414714401a1ca5749
BLAKE2b-256 e2e756ba0bfc726178e8e777e99b7bc64c003c9e0d47226be40cbecb989a978d

See more details on using hashes here.

File details

Details for the file dtaianomaly-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: dtaianomaly-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for dtaianomaly-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e9155e0003c0e069c6f71ded748aba350813f5e775b0a5ebe096eb8b51896d65
MD5 376631b7a6ba65778dab18efe9d98ecb
BLAKE2b-256 1e0cee767318e124d0e2906ea3aea0c4d07dfb172bfcbd16062a7e1dcc331b51

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page