The ultimate anomaly detection library.
Project description
Anomalytics
Your Ultimate Anomaly Detection & Analytics Tool
Installation
# Install without openpyxl
$ pip3 install anomalytics
# Install with openpyxl
$ pip3 install "anomalytics[extra]"
Use Case
anomalytics
can be used to analyze anomalies in your dataset (boht pandas.DataFrame
or pandas.Series
). To start, let's follow along with this minimum example where we want to detect extremely high anomalies in our time series dataset.
-
Import
anomalytics
and initialise your time series:import anomalytics as atics ts = atics.read_ts( "my_dataset.csv", "csv" ) ts.head()
Date-Time 2008-11-03 08:00:00 -0.282 2008-11-03 09:00:00 -0.368 2008-11-03 10:00:00 -0.400 2008-11-03 11:00:00 -0.320 2008-11-03 12:00:00 -0.155 Name: Example Dataset, dtype: float64
-
Set the time windows of t0, t1, and t2 to compute dynamic expanding period for calculating the threshold via quantile:
t0, t1, t2 = atics.set_time_window(ts.shape[0], "POT", "historical", t0_pct=0.7, t1_pct=0.2, t2_pct=0.1) print(f"T0: {t0}") print(f"T1: {t1}") print(f"T2: {t2}")
T0: 70000 T1: 20000 T2: 10000
-
Extract exceedances and indicate that it is a
"high"
anomaly type and what's theq
uantile:exceedance_ts = atics.get_exceedance_peaks_over_threshold(ts, ts.shape[0], "high", 0.95) exceedance_ts.tail()
Date-Time 2020-03-31 19:00:00 0.867 2020-03-31 20:00:00 0.867 2020-03-31 21:00:00 0.867 2020-03-31 22:00:00 0.867 2020-03-31 23:00:00 0.867 Name: Example Dataset, dtype: float64
-
Compute the anomaly score for each exceedance and initialize a params for further analysis and evaluation:
params = {} anomaly_score_ts = atics.get_anomaly_score(exceedance_ts, exceedance_ts.shape[0], params) anomaly_score_ts.head()
Date-Time 2016-10-29 00:00:00 0.0 2016-10-29 01:00:00 0.0 2016-10-29 02:00:00 0.0 2016-10-29 03:00:00 0.0 2016-10-29 04:00:00 0.0 Name: Example Dataset, dtype: float64 ...
-
Inspec our parameters (the result of genpareto fitting):
print(params)
{0: {'datetime': Timestamp('2016-10-29 03:00:00'), 'c': 0.0, 'loc': 0.0, 'scale': 0.0, 'p_value': 0.0, 'anomaly_score': 0.0}, 1: {'datetime': Timestamp('2016-10-29 04:00:00'), ... 'loc': 0, 'scale': 0.19125308567629334, 'p_value': 0.19286132173263668, 'anomaly_score': 5.1850728337654886}, ...}
-
Detect the extremely high anomalies:
anomaly_ts = pot_detecto.detect(anomaly_score_ts, t1, 0.90) anomaly_ts.head()
Date-Time 2019-02-09 08:00:00 False 2019-02-09 09:00:00 False 2019-02-09 10:00:00 False 2019-02-09 11:00:00 False 2019-02-09 12:00:00 False Name: Example Dataset, dtype: bool
-
Evaluate your analysis result with Kolmogorov Smirnov 1 sample test:
ks_result = ks_1sample(ts=exceedance_ts, stats_method="POT", fit_params=params) print(ks_result)
{'total_nonzero_exceedances': 5028, 'start_datetime': '2023-10-1000:00:00', 'end_datetime': '2023-10-1101:00:00', 'stats_distance': 0.0284, 'p_value': 0.8987, 'c': 0.003566, 'loc': 0, 'scale': 0.140657}
Reference
-
Nakamura, C. (2021, July 13). On Choice of Hyper-parameter in Extreme Value Theory Based on Machine Learning Techniques. arXiv:2107.06074 [cs.LG]. https://doi.org/10.48550/arXiv.2107.06074
-
Davis, N., Raina, G., & Jagannathan, K. (2019). LSTM-Based Anomaly Detection: Detection Rules from Extreme Value Theory. In Proceedings of the EPIA Conference on Artificial Intelligence 2019. https://doi.org/10.48550/arXiv.1909.06041
-
Arian, H., Poorvasei, H., Sharifi, A., & Zamani, S. (2020, November 13). The Uncertain Shape of Grey Swans: Extreme Value Theory with Uncertain Threshold. arXiv:2011.06693v1 [econ.GN]. https://doi.org/10.48550/arXiv.2011.06693
-
Yiannis Kalliantzis. (n.d.). Detect Outliers: Expert Outlier Detection and Insights. Retrieved [23-12-04T15:10:12.000Z], from https://detectoutliers.com/
Wall of Fame
I am deeply grateful to have met, guided, or even just read some inspirational works from people who motivate me to publish this open-source package as a part of my capstone project at CODE university of applied sciences in Berlin (2023):
- My lovely mother Sarbina Lindenberg
- Adam Roe
- Alessandro Dolci
- Christian Leschinski
- Johanna Kokocinski
- Peter Krauß
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for anomalytics-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13436a5bf0b5a80b0c8e96cf286434473079460f3ccb30fc40513027fdc680e0 |
|
MD5 | b3eb7e537c142f375e1ab8a5f754d19a |
|
BLAKE2b-256 | 0804d572006707cffd554d001ea83df0536e5da97cbd48b1e69907c66fe42a30 |