Conformal anomaly detection for 'PyOD'-detectors.
Project description
unquad
A Python library for uncertainty-quantified anomaly detection.
unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) for uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.
What is Conformal Anomaly Detection?
Conformal anomaly detection (CAD) is based on the model-agnostic and non-parametric framework of conformal prediction (CP). While CP aims to produce statistically valid prediction regions (prediction intervals or prediction sets) for any given point predictor or classifier, CAD aims to control the false discovery rate (FDR) for any given anomaly detector, suitable for one-class classification, without compromising its statistical power.
CAD translates anomaly scores into statistical p-values by comparing anomaly scores observed on test data to a retained set of calibration scores as previously on normal data during model training (see One-Class Classification). The larger the discrepancy between normal scores and observed test scores, the lower the obtained (statistically valid) p-value. The p-values, instead of the usual anomaly estimates, allow for FDR control by statistical procedures like Benjamini-Hochberg.
Getting started
pip install unquad
Usage: CV+
from pyod.models.iforest import IForest
from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power
dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()
ce = ConformalEstimator(
detector=IForest(behaviour="new"),
method=Method.CV_PLUS,
split=SplitConfiguration(n_split=10),
adjustment=Adjustment.BENJAMINI_HOCHBERG,
alpha=0.2, # nominal FDR level
seed=1
)
ce.fit(x_train) # model fit and calibration
estimates = ce.predict(x_test, raw=False)
print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))
Output:
0.174 # empirical FDR
0.826 # empirical Power
Usage: Jackknife+-after-Bootstrap
from pyod.models.iforest import IForest
from unquad.estimator.conformal_estimator import ConformalEstimator
from unquad.estimator.split_configuration import SplitConfiguration
from unquad.datasets.loader import DataLoader
from unquad.enums.adjustment import Adjustment
from unquad.enums.dataset import Dataset
from unquad.enums.method import Method
from unquad.evaluation.metrics import false_discovery_rate, statistical_power
dl = DataLoader(dataset=Dataset.THYROID)
x_train, x_test, y_test = dl.get_example_setup()
ce = ConformalEstimator(
detector=IForest(behaviour="new"),
method=Method.JACKKNIFE_PLUS_AFTER_BOOTSTRAP,
split=SplitConfiguration(n_split=0.95, n_bootstraps=40),
adjustment=Adjustment.BENJAMINI_HOCHBERG,
alpha=0.1, # nominal FDR level
seed=1,
)
ce.fit(x_train) # model fit and calibration
estimates = ce.predict(x_test, raw=False)
print(false_discovery_rate(y=y_test, y_hat=estimates))
print(statistical_power(y=y_test, y_hat=estimates))
Output:
0.041 # empirical FDR
0.959 # empirical Power
Supported Estimators
The package currently supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are therefore internally set to the smallest possible values.
Models that are currently supported include:
- Angle-Based Outlier Detection (ABOD)
- Autoencoder (AE)
- Cook's Distance (CD)
- Copula-based Outlier Detector (COPOD)
- Deep Isolation Forest (DIF)
- Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
- Gaussian Mixture Model (GMM)
- Histogram-based Outlier Detection (HBOS)
- Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
- Isolation Forest (IForest)
- Kernel Density Estimation (KDE)
- k-Nearest Neighbor (kNN)
- Kernel Principal Component Analysis (KPCA)
- Linear Model Deviation-base Outlier Detection (LMDD)
- Local Outlier Factor (LOF)
- Local Correlation Integral (LOCI)
- Lightweight Online Detector of Anomalies (LODA)
- Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
- GNN-based Anomaly Detection Method (LUNAR)
- Median Absolute Deviation (MAD)
- Minimum Covariance Determinant (MCD)
- One-Class SVM (OCSVM)
- Principal Component Analysis (PCA)
- Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
- Rotation-based Outlier Detection (ROD)
- Subspace Outlier Detection (SOD)
- Scalable Unsupervised Outlier Detection (SUOD)
Contact
Bug reporting: https://github.com/OliverHennhoefer/unquad/issues
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unquad-0.0.9.tar.gz
.
File metadata
- Download URL: unquad-0.0.9.tar.gz
- Upload date:
- Size: 64.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8373356a7fcf49bf286e72d5f8bbd44150ef42797ba1db4783b4f553bb66a160 |
|
MD5 | ff16d35017ee9a1872778ff5c175ca8e |
|
BLAKE2b-256 | 4f479fa36cbb15215cc84d7ad764ad7e4ccafecea44f29a43d2ad068c1df4b8e |
File details
Details for the file unquad-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: unquad-0.0.9-py3-none-any.whl
- Upload date:
- Size: 35.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f10581d40efe4c8f3d3c49e8f2e8921e47596f78c7dcc65cc833b83ef39b1f9 |
|
MD5 | 8f204829bd96db5d3e1f860a38a8e5c5 |
|
BLAKE2b-256 | 2e2c76d8ce443864e2a0ae57f6a96c13a606b00d98218aa04b1b9e4bced77613 |