Skip to main content

A set of python modules for anomaly detection

Project description

https://img.shields.io/pypi/v/kenchi.svg https://img.shields.io/pypi/pyversions/kenchi.svg https://img.shields.io/pypi/l/kenchi.svg https://img.shields.io/conda/v/Y_oHr_N/kenchi.svg https://img.shields.io/conda/pn/Y_oHr_N/kenchi.svg https://img.shields.io/readthedocs/kenchi/stable.svg https://img.shields.io/travis/HazureChi/kenchi/master.svg https://img.shields.io/appveyor/ci/Y-oHr-N/kenchi/master.svg https://img.shields.io/coveralls/github/HazureChi/kenchi/master.svg https://img.shields.io/codeclimate/maintainability/HazureChi/kenchi.svg https://mybinder.org/badge.svg

kenchi

This is a scikit-learn compatible library for anomaly detection.

Dependencies

Installation

You can install via pip

pip install kenchi

or conda.

conda install -c y_ohr_n kenchi

Algorithms

  • Outlier detection
    1. FastABOD [8]
    2. LOF [2] (scikit-learn wrapper)
    3. KNN [1], [12]
    4. OneTimeSampling [14]
    5. HBOS [5]
  • Novelty detection
    1. OCSVM [13] (scikit-learn wrapper)
    2. MiniBatchKMeans
    3. IForest [10] (scikit-learn wrapper)
    4. PCA
    5. GMM (scikit-learn wrapper)
    6. KDE [11] (scikit-learn wrapper)
    7. SparseStructureLearning [6]

Examples

import matplotlib.pyplot as plt
import numpy as np
from kenchi.datasets import load_pima
from kenchi.outlier_detection import *
from kenchi.pipeline import make_pipeline
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

np.random.seed(0)

scaler = StandardScaler()

detectors = [
    FastABOD(novelty=True, n_jobs=-1), OCSVM(),
    MiniBatchKMeans(), LOF(novelty=True, n_jobs=-1),
    KNN(novelty=True, n_jobs=-1), IForest(n_jobs=-1),
    PCA(), KDE()
]

# Load the Pima Indians diabetes dataset.
X, y = load_pima(return_X_y=True)
X_train, X_test, _, y_test = train_test_split(X, y)

# Get the current Axes instance
ax = plt.gca()

for det in detectors:
    # Fit the model according to the given training data
    pipeline = make_pipeline(scaler, det).fit(X_train)

    # Plot the Receiver Operating Characteristic (ROC) curve
    pipeline.plot_roc_curve(X_test, y_test, ax=ax)

# Display the figure
plt.show()
https://raw.githubusercontent.com/HazureChi/kenchi/master/docs/images/readme.png

References

[1]Angiulli, F., and Pizzuti, C., “Fast outlier detection in high dimensional spaces,” In Proceedings of PKDD, pp. 15-27, 2002.
[2]Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J., “LOF: identifying density-based local outliers,” In Proceedings of SIGMOD, pp. 93-104, 2000.
[3]Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017.
[4]Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016.
[5]Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012.
[6]Ide, T., Lozano, C., Abe, N., and Liu, Y., “Proximity-based anomaly detection using sparse structure learning,” In Proceedings of SDM, pp. 97-108, 2009.
[7]Kriegel, H.-P., Kroger, P., Schubert, E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011.
[8]Kriegel, H.-P., Schubert, M., and Zimek, A., “Angle-based outlier detection in high-dimensional data,” In Proceedings of SIGKDD, pp. 444-452, 2008.
[9]Lee, W. S, and Liu, B., “Learning with positive and unlabeled examples using weighted Logistic Regression,” In Proceedings of ICML, pp. 448-455, 2003.
[10]Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008.
[11]Parzen, E., “On estimation of a probability density function and mode,” Ann. Math. Statist., 33(3), pp. 1065-1076, 1962.
[12]Ramaswamy, S., Rastogi, R., and Shim, K., “Efficient algorithms for mining outliers from large data sets,” In Proceedings of SIGMOD, pp. 427-438, 2000.
[13]Scholkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., and Williamson, R. C., “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, 13(7), pp. 1443-1471, 2001.
[14]Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for kenchi, version 0.10.0
Filename, size File type Python version Upload date Hashes
Filename, size kenchi-0.10.0-py3-none-any.whl (384.9 kB) File type Wheel Python version py3 Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page