Skip to main content

Conformal Anomaly Detection

Project description

Logo


Python versions codecov PyPI version Docs

Conformal Anomaly Detection

Thresholds for anomaly detection are often arbitrary and lack theoretical guarantees. nonconform wraps anomaly detectors (from PyOD, scikit-learn, or custom implementations) and transforms their raw anomaly scores into statistically valid p-values. It applies principles from conformal prediction to one-class classification, enabling anomaly detection with provable statistical guarantees and a controlled false discovery rate.

Note: The methods in nonconform assume that training and test data are exchangeable [Vovk et al., 2005]. Therefore, the package is not suited for data with spatial or temporal autocorrelation unless such dependencies are explicitly handled in preprocessing or model design.

:hatching_chick: Getting Started

Installation via PyPI:

pip install nonconform

Note: The following examples use an external dataset API. Install with pip install oddball or pip install "nonconform[data]" to include it. (see Optional Dependencies)

Classical (Conformal) Approach

Example: Detecting anomalies with Isolation Forest on the Shuttle dataset. The approach splits data for calibration, trains the model, then converts anomaly scores to p-values by comparing test scores against the calibration distribution. See ConformalDetector, Split, and FDR Control.

from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control

from nonconform import ConformalDetector, Split
from nonconform.metrics import false_discovery_rate, statistical_power
from oddball import Dataset, load

x_train, x_test, y_test = load(Dataset.SHUTTLE, setup=True, seed=42)

detector = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Split(n_calib=1_000),
    seed=42,
)
p_values = detector.fit(x_train).compute_p_values(x_test)
decisions = false_discovery_control(p_values, method="bh") <= 0.2

print(f"Empirical FDR: {false_discovery_rate(y_test, decisions)}")
print(f"Statistical Power: {statistical_power(y_test, decisions)}")

Output:

Empirical FDR: 0.18
Statistical Power: 0.99

:hatched_chick: Advanced Methods

Two advanced approaches are implemented that may increase the power of a conformal anomaly detector:

  • A KDE-based (probabilistic) approach that models the calibration scores to achieve continuous p-values in contrast to the standard empirical distribution function.
  • A weighted approach that prioritizes calibration scores by their similarity to the test batch at hand and is more robust to covariate shift between test and calibration data (can be combined with the probabilistic approach).

Probabilistic Conformal Approach:

from pyod.models.iforest import IForest

from nonconform import ConformalDetector, Split, Probabilistic

detector = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Split(n_calib=1_000),
    estimation=Probabilistic(n_trials=10),
    seed=42,
)

Weighted Conformal Anomaly Detection:

from pyod.models.iforest import IForest

from nonconform import ConformalDetector, Split, logistic_weight_estimator

detector = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Split(n_calib=1_000),
    weight_estimator=logistic_weight_estimator(),
    seed=42,
)

Note: Weighted procedures require weighted FDR control for statistical validity (see nonconform.fdr.weighted_false_discovery_control()).

Beyond Static Data

While primarily designed for static (single-batch) applications, the optional onlinefdr dependency provides FDR control methods appropriate for streaming scenarios.

Custom Detectors

Any detector implementing the AnomalyDetector protocol works with nonconform:

from typing import Self

import numpy as np

class MyDetector:
    def fit(self, X, y=None) -> Self: ...
    def decision_function(self, X) -> np.ndarray: ...  # higher = more anomalous
    def get_params(self, deep=True) -> dict: ...
    def set_params(self, **params) -> Self: ...

For custom detectors, either set score_polarity explicitly ("higher_is_anomalous" in most cases), or omit it to use the pre-release default behavior. Use score_polarity="auto" only when you want strict detector-family validation.

See Detector Compatibility for details and examples.

Citation

If you find this repository useful for your research, please cite the following papers:

Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
@inproceedings{Hennhofer2024,
    title     = {Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors},
    author    = {Hennhofer, Oliver and Preisach, Christine},
    year      = {2024},
    month     = {Dec},
    booktitle = {2024 IEEE International Conference on Knowledge Graph (ICKG)},
    publisher = {IEEE Computer Society},
    address   = {Los Alamitos, CA, USA},
    pages     = {110--119},
    doi       = {10.1109/ICKG63256.2024.00022},
    url       = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}
}
Testing for Outliers with Conformal p-Values
@article{Bates2023,
    title     = {Testing for outliers with conformal p-values},
    author    = {Bates, Stephen and Candès, Emmanuel and Lei, Lihua and Romano, Yaniv and Sesia, Matteo},
    year      = {2023},
    month     = {Feb},
    journal   = {The Annals of Statistics},
    publisher = {Institute of Mathematical Statistics},
    volume    = {51},
    number    = {1},
    doi       = {10.1214/22-aos2244},
    issn      = {0090-5364},
    url       = {http://dx.doi.org/10.1214/22-AOS2244}
}
Algorithmic Learning in a Random World
@book{Vovk2005,
    title     = {Algorithmic Learning in a Random World},
    author    = {Vladimir Vovk and Alex Gammerman and Glenn Shafer},
    year      = {2005},
    publisher = {Springer},
    note      = {Springer, New York},
    language  = {English}
}

Optional Dependencies

For additional features, you might need optional dependencies:

  • pip install nonconform[pyod] - Includes PyOD anomaly detection library
  • pip install nonconform[data] - Includes oddball for loading benchmark datasets
  • pip install nonconform[fdr] - Includes advanced FDR control methods (online-fdr)
  • pip install nonconform[probabilistic] - Includes KDEpy and Optuna for probabilistic estimation/tuning
  • pip install nonconform[all] - Includes all optional dependencies

Please refer to the pyproject.toml for details.

Contact

Bug reporting: https://github.com/OliverHennhoefer/nonconform/issues


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nonconform-0.98.6.tar.gz (476.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nonconform-0.98.6-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file nonconform-0.98.6.tar.gz.

File metadata

  • Download URL: nonconform-0.98.6.tar.gz
  • Upload date:
  • Size: 476.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nonconform-0.98.6.tar.gz
Algorithm Hash digest
SHA256 32053e399077ea03741fdc7c28916f96e08bd7a53121753d91984989149db12b
MD5 f9ed3bf98afa4afc43b7817ac10c70a4
BLAKE2b-256 593a9377fa006a3ed6f23d3606740c1ff10dc4960d6e693a9f501a5ded7ff76b

See more details on using hashes here.

File details

Details for the file nonconform-0.98.6-py3-none-any.whl.

File metadata

  • Download URL: nonconform-0.98.6-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nonconform-0.98.6-py3-none-any.whl
Algorithm Hash digest
SHA256 33ade97e87209ade4b2028db18cafc760c2b0c95f141843c42f676ee922250ba
MD5 6c0b3e29f5fbdcd2aa7be19d94005099
BLAKE2b-256 0cb14013961f238933bf9b0342a7942b1b03958caf25279421296c4b3536a365

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page