Skip to main content

A compact toolbox for semi-supervised anomaly detection

Project description

anomatools

anomatools is a small Python package containing recent anomaly detection algorithms. Anomaly detection strives to detect abnormal or anomalous data points from a given (large) dataset. The package contains several state-of-the-art semi-supervised and unsupervised anomaly detection algorithms.

Installation

Install the package directly from PyPi with the following command:

pip install anomatools

OR install the package using the setup.py file:

python setup.py install

OR install it directly from GitHub itself:

pip install git+https://github.com/Vincent-Vercruyssen/anomatools.git@master

Contents and usage

Semi-supervised anomaly detection

Given a dataset with attributes X and labels Y, indicating whether a data point is normal or anomalous, semi-supervised anomaly detection algorithms are trained using all the instances X and some of the labels Y. Semi-supervised approaches to anomaly detection generally outperform the unsupervised approaches, because they can use the label information to correct the assumptions on which the unsupervised detection process is based. The anomatools package implements two recent semi-supervised anomaly detection algorithms:

  1. The SSDO (semi-supervised detection of outliers) algorithm first computes an unsupervised prior anomaly score and then corrects this score with the known label information [1].
  2. The SSkNNO (semi-supervised k-nearest neighbor anomaly detection) algorithm is a combination of the well-known kNN classifier and the kNNO (k-nearest neighbor outlier detection) method [2].

Given a training dataset X_train with labels Y_train, and a test dataset X_test, the algorithms are applied as follows:

from anomatools.models import SSkNNO, SSDO

# train
detector = SSDO()
detector.fit(X_train, Y_train)

# predict
labels = detector.predict(X_test)

Similarly, the probability of each point in X_test being normal or anomalous can also be computed:

probabilities = detector.predict_proba(X_test, method='squash')

Sometimes we are interested in detecting anomalies in the training data (e.g., when we are doing a post-mortem analysis):

# train
detector = SSDO()
detector.fit(X_train, Y_train)

# predict
labels = detector.labels_

Unsupervised anomaly detection:

Unsupervised anomaly detectors do not make use of label information (user feedback) when detecting anomalies in a dataset. Given a dataset with attributes X and labels Y, the unsupervised detectors are trained using only X. The anomatools package implements two recent semi-supervised anomaly detection algorithms:

  1. The kNNO (k-nearest neighbor outlier detection) algorithm computes for each data point the anomaly score as the distance to its k-nearest neighbor in the dataset [3].
  2. The iNNE (isolation nearest neighbor ensembles) algorithm computes for each data point the anomaly score roughly based on how isolation the point is from the rest of the data [4].

Given a training dataset X_train with labels Y_train, and a test dataset X_test, the algorithms are applied as follows:

from anomatools.models import kNNO, iNNE

# train
detector = kNNO()
detector.fit(X_train, Y_train)

# predict
labels = detector.predict(X_test)

Package structure

The anomaly detection algorithms are located in: anomatools/models/

For further examples of how to use the algorithms see the notebooks: anomatools/notebooks/

Dependencies

The anomatools package requires the following python packages to be installed:

Contact

Contact the author of the package: vincent.vercruyssen@kuleuven.be

References

[1] Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Bäumer, R., Davis, J. (2018) Semi-Supervised Anomaly Detection with an Application to Water Analytics. IEEE International Conference on Data Mining (ICDM), Singapore. p527--536.

[2] Vercruyssen, V., Meert, W., Davis, J. (2020) Transfer Learning for Anomaly Detection through Localized and Unsupervised Instance Selection. AAAI Conference on Artificial Intelligence, New York.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anomatools-3.0.3.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

anomatools-3.0.3-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file anomatools-3.0.3.tar.gz.

File metadata

  • Download URL: anomatools-3.0.3.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for anomatools-3.0.3.tar.gz
Algorithm Hash digest
SHA256 90a3b7be8bdd9bb4760f8741c425b57181f880ecd8b2544b4ccd5263ed9b5e31
MD5 2dade317757012cf5f5d9297ddb29d7b
BLAKE2b-256 2982c25041479356791276ef18cf80071c954efede16b3df3196263f8985cb76

See more details on using hashes here.

File details

Details for the file anomatools-3.0.3-py3-none-any.whl.

File metadata

  • Download URL: anomatools-3.0.3-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for anomatools-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b56b9a44cf602eb2a06921fc431d909118a602b94ad8f14cd2b1dd8be5a53ec
MD5 4a4881ba7c0a916a4d20c33b247ae5e7
BLAKE2b-256 74aa1a168d3c69b3c48ae73f6a07a8303a077c25cb9253fbc0d5911a337b4edf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page