Skip to main content

The ultimate anomaly detection library.

Project description

Anomalytics

Your Ultimate Anomaly Detection & Analytics Tool

pre-commit.ci status Code style: black Imports: isort mypy checked CI - Build CI - Code Quality CI - Automated Testing License: MIT Documentation PyPi

Installation

# Install without openpyxl
$ pip3 install anomalytics

# Install with openpyxl
$ pip3 install "anomalytics[extra]"

Use Case

anomalytics can be used to analyze anomalies in your dataset (boht pandas.DataFrame or pandas.Series). To start, let's follow along with this minimum example where we want to detect extremely high anomalies in our time series dataset.

  1. Import anomalytics and initialise your time series:

    import anomalytics as atics
    
    ts = atics.read_ts(
        "my_dataset.csv",
        "csv"
    )
    ts.head()
    
    Date-Time
    2008-11-03 08:00:00   -0.282
    2008-11-03 09:00:00   -0.368
    2008-11-03 10:00:00   -0.400
    2008-11-03 11:00:00   -0.320
    2008-11-03 12:00:00   -0.155
    Name: Example Dataset, dtype: float64
    
  2. Set the time windows of t0, t1, and t2 to compute dynamic expanding period for calculating the threshold via quantile:

    t0, t1, t2 = atics.set_time_window(ts.shape[0], "POT", "historical", t0_pct=0.7, t1_pct=0.2, t2_pct=0.1)
    print(f"T0: {t0}")
    print(f"T1: {t1}")
    print(f"T2: {t2}")
    
    T0: 70000
    T1: 20000
    T2: 10000
    
  3. Extract exceedances and indicate that it is a "high" anomaly type and what's the quantile:

    exceedance_ts = atics.get_exceedance_peaks_over_threshold(ts, ts.shape[0], "high", 0.95)
    exceedance_ts.tail()
    
    Date-Time
    2020-03-31 19:00:00    0.867
    2020-03-31 20:00:00    0.867
    2020-03-31 21:00:00    0.867
    2020-03-31 22:00:00    0.867
    2020-03-31 23:00:00    0.867
    Name: Example Dataset, dtype: float64
    
  4. Compute the anomaly score for each exceedance and initialize a params for further analysis and evaluation:

    params = {}
    anomaly_score_ts = atics.get_anomaly_score(exceedance_ts, exceedance_ts.shape[0], params)
    anomaly_score_ts.head()
    
    Date-Time
    2016-10-29 00:00:00    0.0
    2016-10-29 01:00:00    0.0
    2016-10-29 02:00:00    0.0
    2016-10-29 03:00:00    0.0
    2016-10-29 04:00:00    0.0
    Name: Example Dataset, dtype: float64
    ...
    
  5. Inspec our parameters (the result of genpareto fitting):

    print(params)
    
    {0: {'datetime': Timestamp('2016-10-29 03:00:00'),
    'c': 0.0,
    'loc': 0.0,
    'scale': 0.0,
    'p_value': 0.0,
    'anomaly_score': 0.0},
    1: {'datetime': Timestamp('2016-10-29 04:00:00'),
    ...
    'loc': 0,
    'scale': 0.19125308567629334,
    'p_value': 0.19286132173263668,
    'anomaly_score': 5.1850728337654886},
    ...}
    
  6. Detect the extremely high anomalies:

    anomaly_ts = pot_detecto.detect(anomaly_score_ts, t1, 0.90)
    anomaly_ts.head()
    
    Date-Time
    2019-02-09 08:00:00    False
    2019-02-09 09:00:00    False
    2019-02-09 10:00:00    False
    2019-02-09 11:00:00    False
    2019-02-09 12:00:00    False
    Name: Example Dataset, dtype: bool
    
  7. Evaluate your analysis result with Kolmogorov Smirnov 1 sample test:

    ks_result = ks_1sample(ts=exceedance_ts, stats_method="POT", fit_params=params)
    print(ks_result)
    
    {'total_nonzero_exceedances': 5028, 'start_datetime': '2023-10-1000:00:00', 'end_datetime': '2023-10-1101:00:00', 'stats_distance': 0.0284, 'p_value': 0.8987, 'c': 0.003566, 'loc': 0, 'scale': 0.140657}
    

Reference

  • Nakamura, C. (2021, July 13). On Choice of Hyper-parameter in Extreme Value Theory Based on Machine Learning Techniques. arXiv:2107.06074 [cs.LG]. https://doi.org/10.48550/arXiv.2107.06074

  • Davis, N., Raina, G., & Jagannathan, K. (2019). LSTM-Based Anomaly Detection: Detection Rules from Extreme Value Theory. In Proceedings of the EPIA Conference on Artificial Intelligence 2019. https://doi.org/10.48550/arXiv.1909.06041

  • Arian, H., Poorvasei, H., Sharifi, A., & Zamani, S. (2020, November 13). The Uncertain Shape of Grey Swans: Extreme Value Theory with Uncertain Threshold. arXiv:2011.06693v1 [econ.GN]. https://doi.org/10.48550/arXiv.2011.06693

  • Yiannis Kalliantzis. (n.d.). Detect Outliers: Expert Outlier Detection and Insights. Retrieved [23-12-04T15:10:12.000Z], from https://detectoutliers.com/

Wall of Fame

I am deeply grateful to have met, guided, or even just read some inspirational works from people who motivate me to publish this open-source package as a part of my capstone project at CODE university of applied sciences in Berlin (2023):

  • My lovely mother Sarbina Lindenberg
  • Adam Roe
  • Alessandro Dolci
  • Christian Leschinski
  • Johanna Kokocinski
  • Peter Krauß

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomalytics-0.1.1.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anomalytics-0.1.1-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file anomalytics-0.1.1.tar.gz.

File metadata

  • Download URL: anomalytics-0.1.1.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for anomalytics-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b5aba2439dcebe8c0e86d27fc6d17a92e9a7ead7af0a05b6df53669db794e7e9
MD5 a4fc50238ec9f108bea3a193c687be1d
BLAKE2b-256 840be4d0d4adaad49621897b8770df02176b004f1fbb0f54ef6c395174adde24

See more details on using hashes here.

File details

Details for the file anomalytics-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: anomalytics-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for anomalytics-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 13436a5bf0b5a80b0c8e96cf286434473079460f3ccb30fc40513027fdc680e0
MD5 b3eb7e537c142f375e1ab8a5f754d19a
BLAKE2b-256 0804d572006707cffd554d001ea83df0536e5da97cbd48b1e69907c66fe42a30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page