Skip to main content

Conformal anomaly detection for 'PyOD'-detectors.

Project description

unquad

A Python library for uncertainty-quantified anomaly detection.

unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) for uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.

License contributions welcome HitCount start with why

What is Conformal Anomaly Detection?

Conformal anomaly detection (CAD) is based on the model-agnostic and non-parametric framework of conformal prediction (CP). While CP aims to produce statistically valid prediction regions (prediction intervals or prediction sets) for any given point predictor or classifier, CAD aims to control the false discovery rate (FDR) for any given anomaly detector, suitable for one-class classification, without compromising its statistical power.

CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration scores as previously on normal data during model training (see One-Class Classification). The larger the discrepancy between normal scores and observed test scores, the lower the obtained (statistically valid) p-value. The p-values, instead of the usual anomaly estimates, allow for FDR control by statistical procedures like Benjamini-Hochberg.

Getting started

pip install unquad

Usage: CV+

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.CV_PLUS,
    split=SplitConfiguration(n_split=10),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.2,  # nominal FDR level
    seed=1
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.174  # empirical FDR
0.826  # empirical Power

Usage: Jackknife+-after-Bootstrap

from pyod.models.iforest import IForest

from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()

ce = ConformalEstimator(
    detector=IForest(behaviour="new"),
    method=Method.JACKKNIFE_PLUS_AFTER_BOOTSTRAP,
    split=SplitConfiguration(n_split=0.95, n_bootstraps=40),
    adjustment=Adjustment.BENJAMINI_HOCHBERG,
    alpha=0.1,  # nominal FDR level
    seed=1,
)

ce.fit(x_train)  # model fit and calibration
estimates = ce.predict(x_test, raw=False)

print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))

Output:

0.041 # empirical FDR
0.959 # empirical Power

Supported Estimators

The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are therefore internally set to the smallest possible values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

Bug reporting: https://github.com/OliverHennhoefer/unquad/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unquad-0.0.9.tar.gz (64.6 MB view details)

Uploaded Source

Built Distribution

unquad-0.0.9-py3-none-any.whl (35.5 MB view details)

Uploaded Python 3

File details

Details for the file unquad-0.0.9.tar.gz.

File metadata

  • Download URL: unquad-0.0.9.tar.gz
  • Upload date:
  • Size: 64.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.0

File hashes

Hashes for unquad-0.0.9.tar.gz
Algorithm Hash digest
SHA256 8373356a7fcf49bf286e72d5f8bbd44150ef42797ba1db4783b4f553bb66a160
MD5 ff16d35017ee9a1872778ff5c175ca8e
BLAKE2b-256 4f479fa36cbb15215cc84d7ad764ad7e4ccafecea44f29a43d2ad068c1df4b8e

See more details on using hashes here.

File details

Details for the file unquad-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: unquad-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 35.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.0

File hashes

Hashes for unquad-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 8f10581d40efe4c8f3d3c49e8f2e8921e47596f78c7dcc65cc833b83ef39b1f9
MD5 8f204829bd96db5d3e1f860a38a8e5c5
BLAKE2b-256 2e2c76d8ce443864e2a0ae57f6a96c13a606b00d98218aa04b1b9e4bced77613

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page