Skip to main content

Conformal Anomaly Detection

Project description

Logo


License Python GitHub tag (latest SemVer) PyPI Code style: black Ruff Hatch

nonconform enhances anomaly detection by providing uncertainty quantification. It acts as a wrapper around most detectors from PyOD (see Supported Estimators). By leveraging one-class classification and conformal inference, nonconform enables statistically rigorous anomaly detection.

  • Uncertainty Quantification: Turn anomaly scores into statistically valid p-values.
  • Control False Positives: Reliably control metrics like the False Discovery Rate (FDR).
  • PyOD Compatibility: Works with most PyOD anomaly detectors (see Supported Estimators).

Getting Started

Installation via PyPI:

pip install nonconform

Note: The following examples use the built-in datasets. Install with pip install nonconform[data] to run these examples. (see Optional Dependencies)

Classical (Conformal) Approach

Example: Detecting anomalies with Isolation Forest on the Shuttle dataset. The approach splits data for calibration, trains the model, then converts anomaly scores to statistically valid p-values by comparing test scores against the calibration distribution.

from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control

from nonconform.strategy import Split
from nonconform.estimation import ConformalDetector
from nonconform.utils.data import load, Dataset
from nonconform.utils.stat import false_discovery_rate, statistical_power

x_train, x_test, y_test = load(Dataset.SHUTTLE, setup=True, seed=42)

estimator = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Split(n_calib=2_000),
    seed=42
)

estimator.fit(x_train)

estimates = estimator.predict(x_test)
decisions = false_discovery_control(estimates, method='bh') <= 0.2

print(f"Empirical False Discovery Rate: {false_discovery_rate(y=y_test, y_hat=decisions)}")
print(f"Empirical Statistical Power (Recall): {statistical_power(y=y_test, y_hat=decisions)}")

Output:

Empirical False Discovery Rate: 0.198
Empirical Statistical Power (Recall): 0.97

Advanced Methods

For advanced use cases, the unified ConformalDetector() supports weighted conformal prediction (robust to covariate shifts) by adding a weight_estimator parameter, and sophisticated calibration strategies like JackknifeBootstrap() for improved results.

Beyond Static Data

While primarily designed for static (single-batch) applications, the library supports streaming scenarios through BatchGenerator() and OnlineGenerator(). For statistically valid FDR control in streaming data, use the optional onlineFDR dependency, which implements appropriate statistical methods.

Citation

If you find this repository useful for your research, please cite the following papers:

Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
@inproceedings{Hennhofer2024,
	title        = {{ Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors }},
	author       = {Hennhofer, Oliver and Preisach, Christine},
	year         = 2024,
	month        = {Dec},
	booktitle    = {2024 IEEE International Conference on Knowledge Graph (ICKG)},
	publisher    = {IEEE Computer Society},
	address      = {Los Alamitos, CA, USA},
	pages        = {110--119},
	doi          = {10.1109/ICKG63256.2024.00022},
	url          = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}
}
Testing for Outliers with Conformal p-Values
@article{Bates2023,
	title        = {Testing for outliers with conformal p-values},
	author       = {Bates,  Stephen and Candès,  Emmanuel and Lei,  Lihua and Romano,  Yaniv and Sesia,  Matteo},
	year         = 2023,
	month        = feb,
	journal      = {The Annals of Statistics},
	publisher    = {Institute of Mathematical Statistics},
	volume       = 51,
	number       = 1,
	doi          = {10.1214/22-aos2244},
	issn         = {0090-5364},
	url          = {http://dx.doi.org/10.1214/22-AOS2244}
}

Optional Dependencies

For additional features, you might need optional dependencies:

  • pip install nonconform[data] - Includes pyarrow for loading example data (via remote download)
  • pip install nonconform[deep] - Includes deep learning dependencies (PyTorch)
  • pip install nonconform[fdr] - Includes advanced FDR control methods (online-fdr)
  • pip install nonconform[dev] - Includes development documentation tools
  • pip install nonconform[all] - Includes all optional dependencies

Please refer to the pyproject.toml for details.

Supported Estimators

Only anomaly estimators suitable for unsupervised one-class classification are supported. Since detectors are trained exclusively on normal data, threshold parameters are automatically set to minimal values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

Bug reporting: https://github.com/OliverHennhoefer/nonconform/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nonconform-0.91.1.tar.gz (380.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nonconform-0.91.1-py3-none-any.whl (69.0 kB view details)

Uploaded Python 3

File details

Details for the file nonconform-0.91.1.tar.gz.

File metadata

  • Download URL: nonconform-0.91.1.tar.gz
  • Upload date:
  • Size: 380.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nonconform-0.91.1.tar.gz
Algorithm Hash digest
SHA256 322b70c47b1b138efb307ee242a457c9830c88733bfba4e3c1ebc3191e47719c
MD5 b2a3c91fd1a74103c6170cfbeff953ab
BLAKE2b-256 e9c3f83f11de79147a96c3f61f8c96573ed3a38ef26f4c1761f18d542b879eb0

See more details on using hashes here.

File details

Details for the file nonconform-0.91.1-py3-none-any.whl.

File metadata

  • Download URL: nonconform-0.91.1-py3-none-any.whl
  • Upload date:
  • Size: 69.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for nonconform-0.91.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b502b0c6d321f8eadc52cd97683e9391bb80bbd253a84b084bd2b473d8bfe98
MD5 9d5bb12d30f06ac17a6a61cdb82f458b
BLAKE2b-256 15508794c0e51950d7e633dd5139895d2cdf9222fe3684b07bc9bc4d7472c742

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page