A compact toolbox for semi-supervised anomaly detection
Project description
anomatools
anomatools
is a small Python package containing recent anomaly detection algorithms.
Anomaly detection strives to detect abnormal or anomalous data points from a given (large) dataset.
The package contains several state-of-the-art semi-supervised and unsupervised anomaly detection algorithms.
Installation
Install the package directly from PyPi with the following command:
pip install anomatools
OR install the package using the setup.py
file:
python setup.py install
OR install it directly from GitHub itself:
pip install git+https://github.com/Vincent-Vercruyssen/anomatools.git@master
Contents and usage
Semi-supervised anomaly detection
Given a dataset with attributes X and labels Y, indicating whether a data point is normal or anomalous, semi-supervised anomaly detection algorithms are trained using all the instances X and some of the labels Y.
Semi-supervised approaches to anomaly detection generally outperform the unsupervised approaches, because they can use the label information to correct the assumptions on which the unsupervised detection process is based.
The anomatools
package implements two recent semi-supervised anomaly detection algorithms:
- The SSDO (semi-supervised detection of outliers) algorithm first computes an unsupervised prior anomaly score and then corrects this score with the known label information [1].
- The SSkNNO (semi-supervised k-nearest neighbor anomaly detection) algorithm is a combination of the well-known kNN classifier and the kNNO (k-nearest neighbor outlier detection) method [2].
Given a training dataset X_train with labels Y_train, and a test dataset X_test, the algorithms are applied as follows:
from anomatools.models import SSkNNO, SSDO
# train
detector = SSDO()
detector.fit(X_train, Y_train)
# predict
labels = detector.predict(X_test)
Similarly, the probability of each point in X_test being normal or anomalous can also be computed:
probabilities = detector.predict_proba(X_test, method='squash')
Sometimes we are interested in detecting anomalies in the training data (e.g., when we are doing a post-mortem analysis):
# train
detector = SSDO()
detector.fit(X_train, Y_train)
# predict
labels = detector.labels_
Unsupervised anomaly detection:
Unsupervised anomaly detectors do not make use of label information (user feedback) when detecting anomalies in a dataset. Given a dataset with attributes X and labels Y, the unsupervised detectors are trained using only X.
The anomatools
package implements two recent semi-supervised anomaly detection algorithms:
- The kNNO (k-nearest neighbor outlier detection) algorithm computes for each data point the anomaly score as the distance to its k-nearest neighbor in the dataset [3].
- The iNNE (isolation nearest neighbor ensembles) algorithm computes for each data point the anomaly score roughly based on how isolation the point is from the rest of the data [4].
Given a training dataset X_train with labels Y_train, and a test dataset X_test, the algorithms are applied as follows:
from anomatools.models import kNNO, iNNE
# train
detector = kNNO()
detector.fit(X_train, Y_train)
# predict
labels = detector.predict(X_test)
Package structure
The anomaly detection algorithms are located in: anomatools/models/
For further examples of how to use the algorithms see the notebooks: anomatools/notebooks/
Dependencies
The anomatools
package requires the following python packages to be installed:
Contact
Contact the author of the package: vincent.vercruyssen@kuleuven.be
References
[1] Vercruyssen, V., Meert, W., Verbruggen, G., Maes, K., Bäumer, R., Davis, J. (2018) Semi-Supervised Anomaly Detection with an Application to Water Analytics. IEEE International Conference on Data Mining (ICDM), Singapore. p527--536.
[2] Vercruyssen, V., Meert, W., Davis, J. (2020) Transfer Learning for Anomaly Detection through Localized and Unsupervised Instance Selection. AAAI Conference on Artificial Intelligence, New York.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file anomatools-3.0.3.tar.gz
.
File metadata
- Download URL: anomatools-3.0.3.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90a3b7be8bdd9bb4760f8741c425b57181f880ecd8b2544b4ccd5263ed9b5e31 |
|
MD5 | 2dade317757012cf5f5d9297ddb29d7b |
|
BLAKE2b-256 | 2982c25041479356791276ef18cf80071c954efede16b3df3196263f8985cb76 |
File details
Details for the file anomatools-3.0.3-py3-none-any.whl
.
File metadata
- Download URL: anomatools-3.0.3-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b56b9a44cf602eb2a06921fc431d909118a602b94ad8f14cd2b1dd8be5a53ec |
|
MD5 | 4a4881ba7c0a916a4d20c33b247ae5e7 |
|
BLAKE2b-256 | 74aa1a168d3c69b3c48ae73f6a07a8303a077c25cb9253fbc0d5911a337b4edf |