Histogram-based visualization approach for visualizing anomaly detection algorithm performance and prediction confidence

Project description

Decision Score Histogram

A new histogram-based approach for visualizing anomaly detection algorithm performance and prediction confidence.

Background

Performance visualization of anomaly detection algorithms is an essential aspect of anomaly and intrusion detection systems. It allows analysts to highlight trends and outliers in anomaly detection models results to gain intuitive understanding of detection models. This work presents a new way of visualizing anomaly detection algorithm results using a histogram.

Importance of the Visualization Approach

provides a better understanding of detection algorithms performance by revealing the exact proportions of true positives, true negatives, false positives, and false negatives of detection algorithms.
provides insights into the strengths and weaknesses of detection algorithms performance on different aspects of a datasets unlike previous approaches that rely on only positive and negative decision scores.
can be applied to performance visualization and analysis of supervised machine learning techniques involving binary classification of imbalanced datasets.

Usage

Below is an example of how to use this histogram.

Input:

decision_scores: Decision/Anomaly scores of detection or binary
classification algorithm --> input to class object
ground_truth: binary class labels of normal (+1) and anomalous (-1) data instances --> input to class object
fig_name (Optional: string): name for saving output plot; default is 'hist_plot' --> input to hist_plot method
Input data type: list, numpy array, pandas DataFrame, pandas Series

Output:

Histogram visualization of true positives, true negatives, false positives and false negatives with their prediction confidences.

General use case

from hist_score import AnomalyScoreHist
fig = AnomalyScoreHist(decision_scores, ground_truth)
fig.plot_hist(fig_name)

Note: The data class convention used in this work is:

1: represents normal class
-1: represents the anomalous class

The code can also handle supervised binary predictions whereby 1: class A and -1: class B.

Example

from hist_score import AnomalyScoreHist

fig = AnomalyScoreHist(test_dec, test['class'])
fig.plot_hist('IF_histogram')

where test_dec is the decision score output of isolation forest detection algorithm on test set 2 of TLIGHT dataset and test['class'] is the class label (1, -1).

hist visualization

References

Users can refer to our paper below for further insight and examples:

Aboah Boateng E, Bruce JW. Unsupervised Machine Learning Techniques for Detecting PLC Process Control Anomalies. Journal of Cybersecurity and Privacy. 2022; 2(2):220-244. https://doi.org/10.3390/jcp2020012

Project details

Release history Release notifications | RSS feed

0.5

Apr 20, 2022

This version

0.3

Apr 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hist_plot-0.3.tar.gz (4.6 kB view hashes)

Uploaded Apr 20, 2022 Source

Hashes for hist_plot-0.3.tar.gz

Hashes for hist_plot-0.3.tar.gz
Algorithm	Hash digest
SHA256	`19b8112eb04b6f4b2e48019784b33ee656f87d7274f25740c5db26c9f33ac6f4`
MD5	`ff293945697774093497ff3512e1ff28`
BLAKE2b-256	`8d1e1d10bc3ea794d35f41d640972e044871d066cb991d2ac87d3246f49e8b12`