Conformal Anomaly Detection
Project description
Conformal Anomaly Detection
Thresholds for anomaly detection are often arbitrary and lack theoretical guarantees. nonconform wraps anomaly detectors (from PyOD, scikit-learn, or custom implementations) and transforms their raw anomaly scores into statistically valid p-values. It applies principles from conformal prediction to one-class classification, enabling anomaly detection with provable statistical guarantees and a controlled false discovery rate.
Note: The methods in nonconform assume that training and test data are exchangeable [Vovk et al., 2005]. Therefore, the package is not suited for data with spatial or temporal autocorrelation unless such dependencies are explicitly handled in preprocessing or model design.
:hatching_chick: Getting Started
Installation via PyPI:
pip install nonconform
Note: The following examples use an external dataset API. Install with
pip install oddballorpip install "nonconform[data]"to include it. (see Optional Dependencies)
Classical (Conformal) Approach
Example: Detecting anomalies with Isolation Forest on the Shuttle dataset. The approach splits data for calibration, trains the model, then converts anomaly scores to p-values by comparing test scores against the calibration distribution. See ConformalDetector, Split, and FDR Control.
from pyod.models.iforest import IForest
from scipy.stats import false_discovery_control
from nonconform import ConformalDetector, Split
from nonconform.metrics import false_discovery_rate, statistical_power
from oddball import Dataset, load
x_train, x_test, y_test = load(Dataset.SHUTTLE, setup=True, seed=42)
detector = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=1_000),
seed=42,
)
p_values = detector.fit(x_train).compute_p_values(x_test)
decisions = false_discovery_control(p_values, method="bh") <= 0.2
print(f"Empirical FDR: {false_discovery_rate(y_test, decisions)}")
print(f"Statistical Power: {statistical_power(y_test, decisions)}")
Output:
Empirical FDR: 0.18
Statistical Power: 0.99
:hatched_chick: Advanced Methods
Two advanced approaches are implemented that may increase the power of a conformal anomaly detector:
- A KDE-based (probabilistic) approach that models the calibration scores to achieve continuous p-values in contrast to the standard empirical distribution function.
- A weighted approach that prioritizes calibration scores by their similarity to the test batch at hand and is more robust to covariate shift between test and calibration data (can be combined with the probabilistic approach).
- Exchangeability martingales for sequential evidence monitoring on conformal p-value streams (
PowerMartingale,SimpleMixtureMartingale,SimpleJumperMartingale).
Probabilistic Conformal Approach:
from pyod.models.iforest import IForest
from nonconform import ConformalDetector, Split, Probabilistic
detector = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=1_000),
estimation=Probabilistic(n_trials=10),
seed=42,
)
Weighted Conformal Anomaly Detection:
from pyod.models.iforest import IForest
from nonconform import ConformalDetector, Split, logistic_weight_estimator
detector = ConformalDetector(
detector=IForest(behaviour="new"),
strategy=Split(n_calib=1_000),
weight_estimator=logistic_weight_estimator(),
seed=42,
)
Note: Weighted procedures require weighted FDR control for statistical validity (see
nonconform.fdr.weighted_false_discovery_control()).
Exchangeability Martingales (sequential monitoring):
This snippet shows martingale setup only. In normal use:
- a fitted
ConformalDetectorproduces streaming conformal p-values from model scores - each incoming p-value is fed to the martingale via
martingale.update(p_t)
from nonconform.martingales import AlarmConfig, PowerMartingale
martingale = PowerMartingale(
epsilon=0.5,
alarm_config=AlarmConfig(ville_threshold=100.0),
)
# update one p-value at a time
state = martingale.update(float(p_t))
# or update a sequence of p-values
states = martingale.update_many(p_values_chunk)
Note: Martingale alarms are evidence-monitoring signals on sequential p-values. They are not a replacement for cross-hypothesis FDR control. See the user guide for a compact end-to-end flow: Exchangeability Martingales.
Beyond Static Data
While primarily designed for static (single-batch) applications, the optional onlinefdr dependency provides FDR control methods appropriate for streaming scenarios.
Custom Detectors
Any detector implementing the AnomalyDetector protocol works with nonconform:
from typing import Self
import numpy as np
class MyDetector:
def fit(self, X, y=None) -> Self: ...
def decision_function(self, X) -> np.ndarray: ... # higher = more anomalous
def get_params(self, deep=True) -> dict: ...
def set_params(self, **params) -> Self: ...
For custom detectors, either set score_polarity explicitly
("higher_is_anomalous" in most cases), or omit it to use the pre-release
default behavior. Use score_polarity="auto" only when you want strict
detector-family validation.
See Detector Compatibility for details and examples.
Citation
If you find this repository useful for your research, please cite the following papers:
Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors
@inproceedings{Hennhofer2024,
title = {Leave-One-Out-, Bootstrap- and Cross-Conformal Anomaly Detectors},
author = {Hennhofer, Oliver and Preisach, Christine},
year = {2024},
month = {Dec},
booktitle = {2024 IEEE International Conference on Knowledge Graph (ICKG)},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
pages = {110--119},
doi = {10.1109/ICKG63256.2024.00022},
url = {https://doi.ieeecomputersociety.org/10.1109/ICKG63256.2024.00022}
}
Testing for Outliers with Conformal p-Values
@article{Bates2023,
title = {Testing for outliers with conformal p-values},
author = {Bates, Stephen and Candès, Emmanuel and Lei, Lihua and Romano, Yaniv and Sesia, Matteo},
year = {2023},
month = {Feb},
journal = {The Annals of Statistics},
publisher = {Institute of Mathematical Statistics},
volume = {51},
number = {1},
doi = {10.1214/22-aos2244},
issn = {0090-5364},
url = {http://dx.doi.org/10.1214/22-AOS2244}
}
Algorithmic Learning in a Random World
@book{Vovk2005,
title = {Algorithmic Learning in a Random World},
author = {Vladimir Vovk and Alex Gammerman and Glenn Shafer},
year = {2005},
publisher = {Springer},
note = {Springer, New York},
language = {English}
}
Testing Exchangeability On-line
@inproceedings{Vovk2003,
title = {Testing Exchangeability On-line},
author = {Vovk, Vladimir and Nouretdinov, Ilia and Gammerman, Alex},
booktitle = {Proceedings of the 20th International Conference on Machine Learning (ICML)},
year = {2003}
}
Retrain or Not Retrain: Conformal Test Martingales for Change-Point Detection
@inproceedings{Vovk2021,
title = {Retrain or Not Retrain: Conformal Test Martingales for Change-Point Detection},
author = {Vovk, Vladimir and Volkhonskiy, Daniil and Nouretdinov, Ilia and Gammerman, Alex},
booktitle = {Proceedings of The 10th Symposium on Conformal and Probabilistic Prediction and Applications},
series = {PMLR},
volume = {152},
pages = {210--231},
year = {2021},
url = {https://proceedings.mlr.press/v152/vovk21b.html}
}
Optional Dependencies
For additional features, you might need optional dependencies:
pip install nonconform[pyod]- Includes PyOD anomaly detection librarypip install nonconform[data]- Includes oddball for loading benchmark datasetspip install nonconform[fdr]- Includes advanced FDR control methods (online-fdr)pip install nonconform[probabilistic]- Includes KDEpy and Optuna for probabilistic estimation/tuningpip install nonconform[all]- Includes all optional dependencies
Please refer to the pyproject.toml for details.
Contact
Bug reporting: https://github.com/OliverHennhoefer/nonconform/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nonconform-0.98.7.tar.gz.
File metadata
- Download URL: nonconform-0.98.7.tar.gz
- Upload date:
- Size: 622.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc25bb2e2c3c46ae692e42d020e629dea5ac6dccc5307732c38ef2068a4e247d
|
|
| MD5 |
fc9efb565a4cda6e15ebff0909b0dcff
|
|
| BLAKE2b-256 |
6223be0946b78bf2d678b8dd5a9aa521fcd67d3d0773f8e62422b7e3fb1fa6fe
|
File details
Details for the file nonconform-0.98.7-py3-none-any.whl.
File metadata
- Download URL: nonconform-0.98.7-py3-none-any.whl
- Upload date:
- Size: 55.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2997874de18ccdba2e5d5306cb639938e89cc80ade84844265ff0039d101df1
|
|
| MD5 |
3554769b7d176f5b0fce1773121bc77c
|
|
| BLAKE2b-256 |
83dac3934121cade5de7a483446faef3cb8b3bfd3cb08a0d7330c7037bf95505
|