Conformal Anomaly Detection
Project description
Conformal Anomaly Detection
Thresholds for anomaly detection are often arbitrary and lack theoretical guarantees about the anomalies they identify. nonconform wraps your favorite anomaly detection model from PyOD (see Supported Estimators) and transforms its raw anomaly scores into statistically valid $p$-values. It applies principles from conformal prediction to the setting of one-class classification, enabling anomaly detection with provable statistical guarantees and a controlled false discovery rate.
Note: The methods in nonconform assume that training and test data are exchangeable. Therefore, the package is not suited for data with spatial or temporal autocorrelation unless such dependencies are explicitly handled in preprocessing or model design.
:hatching_chick: Getting Started
Installation via PyPI:
pip install nonconform
Note: The following examples use the built-in datasets. Install with
pip install nonconform[data]to run these examples. (see Optional Dependencies)
Classical (Conformal) Approach
Example: Detecting anomalies with Isolation Forest on the Shuttle dataset. The approach splits data for calibration, trains the model, then converts anomaly scores to p-values by comparing test scores against the calibration distribution.
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control
from nonconform.strategy import Split
from nonconform.detection import ConformalDetector
from nonconform.utils.data import load, Dataset
from nonconform.utils.stat import false_discovery_rate, statistical_power
x_train, x_test, y_test = load(Dataset.SHUTTLE, setup=True, seed=42)
estimator = ConformalDetector(
detector=IForest(behaviour="new"), strategy=Split(n_calib=1_000), seed=42)
estimator.fit(x_train)
estimates = estimator.predict(x_test)
decisions = false_discovery_control(estimates, method='bh') <= 0.2
print(f"Empirical False Discovery Rate: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Statistical Power (Recall): {statistical_power(y=y_test, y_hat=decisions)}")
Output:
Empirical False Discovery Rate: 0.18
Empirical Statistical Power (Recall): 0.99
:hatched_chick: Advanced Methods
Two advanced approaches are implemented that may increase the power of a conformal anomaly detector:
- A KDE-based (probabilistic) approach that models the calibration scores to achieve continuous $p$-values in contrast to the standard empirical distribution function.
- A weighted approach that prioritizes calibration scores by their similarity to the test batch at hand and is more robust to covariate shift between test and calibration data. Maybe combine with the probabilistic approach.
Probabilistic Conformal Approach:
estimator = ConformalDetector(
detector=HBOS(),
strategy=Split(n_calib=1_000),
estimation=Probabilistic(n_trials=10), # KDE Tuning Trials
seed=1,
)
Weighed Conformal Anomaly Detection:
# Weighted conformal (with covariate shift handling):
from nonconform.detection.weight import LogisticWeightEstimator
estimator = ConformalDetector(
detector=IForest(behaviour="new"), strategy=Split(n_calib=1_000), weight_estimator=LogisticWeightEstimator(seed=42), seed=42)
Note: Weighted procedures require weighted FDR control for statistical validity (see
weighted_bh()orweighted_false_discovery_control()).
Beyond Static Data
While primarily designed for static (single-batch) applications, the library supports streaming scenarios through BatchGenerator() and OnlineGenerator(). For statistically valid FDR control in streaming data, use the optional onlineFDR dependency, which implements appropriate statistical methods.
Citation
If you find this repository useful for your research, please cite the following papers:
Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
@inproceedings{Hennhofer2024,
title = {{ Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors }}, author = {Hennhofer, Oliver and Preisach, Christine}, year = 2024, month = {Dec}, booktitle = {2024 IEEE International Conference on Knowledge Graph (ICKG)}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, pages = {110--119}, doi = {10.1109/ICKG63256.2024.00022}, url = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}}
Testing for Outliers with Conformal p-Values
@article{Bates2023,
title = {Testing for outliers with conformal p-values}, author = {Bates, Stephen and Candès, Emmanuel and Lei, Lihua and Romano, Yaniv and Sesia, Matteo}, year = 2023, month = feb, journal = {The Annals of Statistics}, publisher = {Institute of Mathematical Statistics}, volume = 51, number = 1, doi = {10.1214/22-aos2244}, issn = {0090-5364}, url = {http://dx.doi.org/10.1214/22-AOS2244}}
Optional Dependencies
For additional features, you might need optional dependencies:
pip install nonconform[data]- Includes pyarrow for loading example data (via remote download)pip install nonconform[deep]- Includes deep learning dependencies (PyTorch)pip install nonconform[fdr]- Includes advanced FDR control methods (online-fdr)pip install nonconform[dev]- Includes development tools documentation toolspip install nonconform[all]- Includes all optional dependencies
Please refer to the pyproject.toml for details.
Supported Estimators
Only anomaly estimators suitable for unsupervised one-class classification are supported. Since detectors are trained exclusively on normal data, threshold parameters are automatically set to minimal values.
Models that are currently supported include:
- Angle-Based Outlier Detection (ABOD)
- Autoencoder (AE)
- Cook's Distance (CD)
- Copula-based Outlier Detector (COPOD)
- Deep Isolation Forest (DIF)
- Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
- Gaussian Mixture Model (GMM)
- Histogram-based Outlier Detection (HBOS)
- Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
- Isolation Forest (IForest)
- Kernel Density Estimation (KDE)
- k-Nearest Neighbor (kNN)
- Kernel Principal Component Analysis (KPCA)
- Linear Model Deviation-base Outlier Detection (LMDD)
- Local Outlier Factor (LOF)
- Local Correlation Integral (LOCI)
- Lightweight Online Detector of Anomalies (LODA)
- Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
- GNN-based Anomaly Detection Method (LUNAR)
- Median Absolute Deviation (MAD)
- Minimum Covariance Determinant (MCD)
- One-Class SVM (OCSVM)
- Principal Component Analysis (PCA)
- Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
- Rotation-based Outlier Detection (ROD)
- Subspace Outlier Detection (SOD)
- Scalable Unsupervised Outlier Detection (SUOD)
Contact
Bug reporting: https://github.com/OliverHennhoefer/nonconform/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nonconform-0.95.0.tar.gz.
File metadata
- Download URL: nonconform-0.95.0.tar.gz
- Upload date:
- Size: 428.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
754022b697639411d70629f5d7491bf10c41ea7109cf5132ec960376a6e97921
|
|
| MD5 |
bfedd2826f7d3d74148703edac6b0e10
|
|
| BLAKE2b-256 |
4363cf7943792d02aafc262b1df0910daa5471246cc9aa540f4c57fdde9a46e2
|
File details
Details for the file nonconform-0.95.0-py3-none-any.whl.
File metadata
- Download URL: nonconform-0.95.0-py3-none-any.whl
- Upload date:
- Size: 84.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59031daee7951f613c3209c6b77eab3b570b26ee3b8cb830260f08c38969070e
|
|
| MD5 |
751aab52351f9c5f486b77143f6f1579
|
|
| BLAKE2b-256 |
f8d38900f05469d066432adacecef006bd9857ea28383b793ef2388bc699f2d1
|