Bayesian Histogram-based Anomaly Detection
Project description
Bayesian Histogram Anomaly Detection (BHAD)
A Python implementation of the Bayesian Histogram-based Anomaly Detection (BHAD) algorithm for unsupervised anomaly detection with explainability features.
Overview
BHAD is an explainable anomaly detection method that leverages Bayesian inference and histogram-based modeling to identify outliers in high-dimensional datasets. The algorithm provides both global and local explainability due to its linear structure, making it particularly valuable for applications requiring interpretable results.
Key Features
- Explainable AI: Provides both global and local explanations for anomaly predictions
- Bayesian Approach: Uses Bayesian inference for robust uncertainty quantification
- High-Dimensional Data: Handles high-dimensional datasets effectively
- Unsupervised Learning: No labeled data required for training
- Linear Structure: Interpretable model architecture
Installation
Using uv
Install package via uv:
uv venv --python 3.12
uv add bhad
Using pip
python3 -m venv .venv
source .venv/bin/activate
pip install bhad
Quick Start
import numpy as np
import pandas as pd
from bhad.model import BHAD
# Load your data
X = pd.DataFrame(np.random.randn(1000, 10),
columns=[f'feature_{i}' for i in range(10)])
# Create BHAD model with integrated discretization
model = BHAD(contamination=0.01, nbins=None, verbose=False)
# Fit the model and predict anomalies
anomaly_labels = model.fit_predict(X) # Returns -1 for outliers, 1 for inliers
anomaly_scores = model.decision_function(X)
Documentation
For detailed usage examples, API reference, and tutorials, visit our documentation.
Examples
The package includes Jupyter notebooks with practical examples:
Toy_Example.ipynb: Simulated data demonstrationTitanic_Example.ipynb: Real-world dataset application
Research & Publications
This implementation is based on the following research papers:
-
Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles
-
Vosseler, A. (2023): BHAD: Explainable anomaly detection using Bayesian histograms
Conference Presentations
- PyCon DE & PyData Berlin 2023: Watch the presentation
- MaxEnt 2023: 42nd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Max-Planck-Institute for Plasma Physics, Garching, Germany
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
Alexander Vosseler
Citation
If you use BHAD in your research, please cite:
@article{vosseler2022unsupervised,
title={Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles},
author={Vosseler, Alexander},
journal={Risks},
volume={10},
number={7},
year={2022},
month={June}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bhad-0.2.9.1.tar.gz.
File metadata
- Download URL: bhad-0.2.9.1.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3796921def906c34ccb47896131b626ed3ade983f3689c94dc1e5ecab732675
|
|
| MD5 |
54721f1355c90cc553c9e4ea9f544a5a
|
|
| BLAKE2b-256 |
835e5914f63ea9e22d3b8b3cd698ba1c14492c232f38384051ddeeda4d2da94a
|
File details
Details for the file bhad-0.2.9.1-py3-none-any.whl.
File metadata
- Download URL: bhad-0.2.9.1-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c52e462a4f9c873efc0e85de1c0809bb00aedc370cbd9f0e46341e388bce33d
|
|
| MD5 |
95bb301ebeeba66dc8e2112a9e2eda2c
|
|
| BLAKE2b-256 |
41868f28a5d0e9a72274988e3ed5cc457c6b02c5c8a4d5ee30ba9d40ad9ad0ce
|