Skip to main content

Conformal anomaly detection for 'PyOD'-detectors.

Project description

unquad

Tired of alarm fatique?

unquad enables conformal anomaly detection for PyOD detectors.

unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) for uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.

License contributions welcome HitCount start with why

What is Conformal Anomaly Detection?

Conformal anomaly detection (CAD) is based on the model-agnostic and non-parametric framework of conformal prediction (CP). While CP aims to produce statistically valid prediction regions (prediction intervals or prediction sets) for any given point predictor or classifier, CAD aims to control the false discovery rate (FDR) for any given anomaly detector, suitable for one-class classification, without compromising its statistical power.

CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration scores as previously on normal data during model training (see One-Class Classification). The larger the discrepancy between normal scores and observed test scores, the lower the obtained (statistically valid) p-value. The p-values, instead of the usual anomaly estimates, allow for FDR-control by statistical procedures like Benjamini-Hochberg.

Assumption

CAD assumes exchangability of training and future test data. Exchangability is closely related to the statistical term of independent and identically distributed random variables (IID). IID implies both, independence and exchangability. Exchangability defines a joint probability distribution that remains the same under permutations of the variables. With that, exchangability is a very practicable assumption as it is a weaker than IID.

Limitations

Since CAD controls the FDR by adjustment procedures in context of multiple testing, trained conformal detectors currently only work for batch-wise anomaly detection (on static data).
Generally, CAD also offers a range of methods for the online setting when working with dynamic time-series data under potential co-variate shift. Currently, this kind of online detector is not implemented. It is planned to add respective methods in future releases.

Getting started

pip install unquad

Usage: Split-Conformal

from pyod.models.iforest import IForest
from pyod.utils import generate_data

from unquad.estimator.conformal import ConformalEstimator
from unquad.enums.adjustment import Adjustment
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

x_train, x_test, y_train, y_test = generate_data(
        n_train=1_000,
        n_test=1_000,
        n_features=10,
        contamination=0.1,
        random_state=1,
    )

x_train = x_train[y_train == 0]  # Normal Instances (One-Class Classification)

ce = ConformalEstimator(
            detector=IForest(behaviour="new"),
            method=Method.CV_PLUS,
            adjustment=Adjustment.BENJAMINI_HOCHBERG,
            alpha=0.2,  # nominal FDR level
            random_state=1,
            split=10,
        )

ce.fit(x_train)  # Model Fit/Calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))  # Empirical FDR
print(statistical_power(y=y_test, y_hat=estimates))  # Empirical Power
Training: 100%|██████████| 10/10 [00:01<00:00,  8.16it/s]
Inference: 100%|██████████| 10/10 [00:00<00:00, 220.63it/s]

Output:

0.194 # Empirical FDR
0.806 # Empirical Power

Usage: Jackknife+-after-Bootstrap

from pyod.models.iforest import IForest
from pyod.utils import generate_data

from unquad.estimator.conformal import ConformalEstimator
from unquad.enums.adjustment import Adjustment
from unquad.estimator.bootstrap.bootstrap_config import BootstrapConfiguration
from unquad.enums.method import Method 
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

x_train, x_test, y_train, y_test = generate_data(
        n_train=1_000,
        n_test=1_000,
        n_features=10,
        contamination=0.1,
        random_state=1,
    )

x_train = x_train[y_train == 0]  # Normal Instances (One-Class Classification)

bc = BootstrapConfiguration(n=1_000, b=40, m=0.95)

ce = ConformalEstimator(
            detector=IForest(behaviour="new"),
            method=Method.JACKKNIFE_PLUS_AFTER_BOOTSTRAP,
            adjustment=Adjustment.BENJAMINI_HOCHBERG,
            alpha=0.1,  # nominal FDR level
            bootstrap_config=bc,
            random_state=1,
        )

ce.fit(x_train)  # Model Fit/Calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))  # Empirical FDR
print(statistical_power(y=y_test, y_hat=estimates))  # Empirical Power
Training: 100%|██████████| 40/40 [00:04<00:00,  8.13it/s]
Inference: 100%|██████████| 40/40 [00:00<00:00, 231.63it/s]

Output:

0.099 # Empirical FDR
0.901 # Empirical Power

Supported Estimators

The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are therefore internally set to the smallest possible values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

General questions: oliver.hennhoefer@h-ka.de
Bug reporting: https://github.com/OliverHennhoefer/unquad/issues

What now?

To dive deeper into the field of conformal inference make sure to visit the awesome-conformal-prediction repository!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unquad-0.0.2.tar.gz (29.1 MB view hashes)

Uploaded Source

Built Distribution

unquad-0.0.2-py3-none-any.whl (13.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page