Skip to main content

Conformal anomaly detection for 'PyOD'-detectors.

Project description

unquad

A Python library for uncertainty-quantified anomaly detection.

unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) for uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.

License contributions welcome HitCount start with why

What is Conformal Anomaly Detection?

Conformal anomaly detection (CAD) is based on the model-agnostic and non-parametric framework of conformal prediction (CP). While CP aims to produce statistically valid prediction regions (prediction intervals or prediction sets) for any given point predictor or classifier, CAD aims to control the false discovery rate (FDR) for any given anomaly detector, suitable for one-class classification, without compromising its statistical power.

CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration scores as previously on normal data during model training (see One-Class Classification). The larger the discrepancy between normal scores and observed test scores, the lower the obtained (statistically valid) p-value. The p-values, instead of the usual anomaly estimates, allow for FDR control by statistical procedures like Benjamini-Hochberg.

Getting started

pip install unquad

Usage: CV+

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.CV_PLUS,
    split=SplitConfiguration(n_split=10),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.2,  # nominal FDR level
    seed=1
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.174  # empirical FDR
0.826  # empirical Power

Usage: Jackknife+-after-Bootstrap

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.JACKKNIFE_PLUS_AFTER_BOOTSTRAP,
    split=SplitConfiguration(n_split=0.95, n_bootstraps=40),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.1,  # nominal FDR level
    seed=1,
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.041 # empirical FDR
0.959 # empirical Power

Supported Estimators

The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are therefore internally set to the smallest possible values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

Bug reporting: https://github.com/OliverHennhoefer/unquad/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unquad-0.0.9.tar.gz (64.6 MB view hashes)

Uploaded Source

Built Distribution

unquad-0.0.9-py3-none-any.whl (35.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page