Skip to main content

Bayesian Histogram-based Anomaly Detection

Project description

Bayesian Histogram-based Anomaly Detection (BHAD)

Python code for the BHAD algorithm as presented in Vosseler, A. (2022): Unsupervised Insurance Fraud Prediction Based on Anomaly Detector Ensembles, Risks, 10(7), 132 and Vosseler, A. (2021): BHAD: Fast unsupervised anomaly detection using Bayesian histograms.

The code follows a standard Scikit-learn API. Code to run the BHAD model is contained in bhad.py and some utility functions are provided in utils.py, e.g. a discretization function in the case of continuous features and the Bayesian model selection approach as outlined in the reference. The explainer.py module contains code to create model explanations.

Package installation

pip install bhad

Usage

1.) Preprocess the input data: discretize continuous features and conduct Bayesian model selection (optionally).

2.) Train the model using discrete data.

For convenience these two steps can be wrapped up via a scikit-learn pipeline (optionally).

from sklearn.pipeline import Pipeline
from bhad.utils import Discretize
from bhad.model import BHAD
from bhad.explainer import Explainer

num_cols = [....]   # names of numeric features
cat_cols = [....]   # categorical features

pipe = Pipeline(steps=[
   ('discrete', Discretize(nbins = None)),   
   ('model', BHAD(contamination = 0.01, num_features = num_cols, cat_features = cat_cols))
])

For a given dataset get binary model decisons:

y_pred = pipe.fit_predict(X = dataset)        

Get global model explanation as well as for individual observations:

local_expl = Explainer(pipe.named_steps['model'], pipe.named_steps['discrete']).fit()

local_expl.get_explanation(nof_feat_expl = 3, append = False)   # individual explanations

local_expl.global_feat_imp                                      # global explanation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bhad-0.0.4.tar.gz (18.1 kB view hashes)

Uploaded Source

Built Distribution

bhad-0.0.4-py3-none-any.whl (18.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page