Mass ratio variance-based outlier factor (MOF)
Project description
pymof
Updated by Mr. Supakit Sroynam (6534467323@student.chula.ac.th) and Krung Sinapiromsaran (krung.s@chula.ac.th)
Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University
Version 0.2: 23 September 2024
Mass-ratio-variance based outlier factor
Latest news
- Documents are editted. 2D and 3D examples are added.
- Suggest the package library before importing.
- The package operates on fixed datasets containing fewer than 1000 data points.
Introduction
An outlier of a finite dataset in statistics is defined as a data point that differs significantly from others. It is normally surrounded by a few data points while normal ones are engulfed by others. This behavior leads to the proposed outlier factor called Mass-ratio-variance based Outlier Factor (MOF). A score is assigned to a data point from the variance of the mass-ratio distribution from the rest of data points. Within a sphere of an outlier, there will be few data points compared with a normal one. So, the mass-ratio of an outlier will be different from that of a normal data point. The algorithm to generate MOF requires no parameter and embraces the density concept.
Citation
If you use this package in your research, please consider citing the below papers.
BibTex for the package:
@inproceedings{changsakul2021mass,
title={Mass-ratio-variance based Outlier Factor},
author={Changsakul, Phichapop and Boonsiri, Somjai and Sinapiromsaran, Krung},
booktitle={2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)},
pages={1--5},
year={2021},
organization={IEEE}
}
@INPROCEEDINGS{10613697,
author={Fan, Zehong and Luangsodsai, Arthorn and Sinapiromsaran, Krung},
booktitle={2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE)},
title={Mass-Ratio-Average-Absolute-Deviation Based Outlier Factor for Anomaly Scoring},
year={2024},
volume={},
number={},
pages={488-493},
keywords={Industries;Software algorithms;Process control;Quality control;Nearest neighbor methods;Fraud;Computer security;Anomaly scoring;Statistical dispersion;Mass-ratio distribution;Local outlier factor;Mass-ratio variance outlier factor},
doi={10.1109/JCSSE61278.2024.10613697}}
Installation
To install pymof
, type the following command in the terminal
pip install pymof # normal install
pip install --upgrade pymof # or update if needed
Use on jupyter notebook
To make sure that the installed package can be called. A user must include the package path before import as
import sys
sys.path.append('/path/to/lib/python3.xx/site-packages')
Required Dependencies :
- Python 3.9 or higher
- numpy>=1.23
- numba>=0.56.0
- scipy>=1.8.0
- scikit-learn>=1.2.0
- matplotlib>=3.5
Documentation
Mass-ratio-variance based Outlier Factor (MOF)
The outlier score of each data point is calculated using the Mass-ratio-variance based Outlier Factor (MOF). MOF quantifies the global deviation of a data point's density relative to the rest of the dataset. This global perspective is crucial because an outlier's score depends on its overall isolation from all other data points. By analyzing the variance of the mass ratio, MOF can effectively identify data points with significantly lower density compared to their neighbors, indicating their outlier status.
MOF()
Initialize a model object
MOF
Parameters :
Return :
self : object
object of MOF model
MOF.fit(Data, Window = 10000)
Fit data to
MOF
model
Parameters :
Data : numpy array of shape (n_points, d_dimensions)
The input samples.
Window : integer (int)
window size for calculation.
default window size is 10000.
Return :
self : object
fitted estimator
MOF.visualize()
Visualize data points with
MOF
's scores
Note cannot visualize data points having a dimension greather than 3
Parameters :
Return :
decision_scores_ : numpy array of shape (n_samples)
decision score for each point
MOF attributes
Attributes | Type | Details |
---|---|---|
MOF.Data | numpy array of shape (n_points, d_dimensions) | input data for scoring |
MOF.decision_scores_ | numpy array of shape (n_samples) | decision score for each point |
Sample usage
# This example is from MOF paper.
import matplotlib.pyplot as plt
data = np.array([[0.0, 1.0], [1.0, 1.0], [2.0, 1.0], [3.0, 1.0],
[0.0, 0.0], [1.0, 0.0], [2.0, 0.0], [3.0, 0.0],
[0.0,-1.0], [1.0,-1.0], [2.0,-1.0], [3.0,-1.0], [8.0, 4.0]
])
model = MOF()
model.fit(data)
scores = model.decision_scores_
print(scores)
model.visualize()
# Create a figure and axes
fig, ax = plt.subplots()
data = model.MassRatio
# Iterate over each row and create a boxplot
for i in range(data.shape[0]):
row = data[i, :]
mask = np.isnan(row)
ax.boxplot(row[~mask], positions=[i + 1], vert=False, widths=0.5)
# Set labels and title
ax.set_xlabel("MOF")
ax.set_ylabel("Data points")
ax.set_title("Boxplot of MassRatio distribution")
# Show the plot
plt.grid(True)
plt.show()
Output
[0.12844997, 0.06254347, 0.08142683, 0.20940997, 0.03981233, 0.0212412 , 0.025438 , 0.08894882, 0.11300615, 0.0500218, 0.05805704, 0.17226989, 2.46193377]
3D sample
# This example demonstrates the usage of MOF
from pymof import MOF
import numpy as np
data = np.array([[-2.30258509, 7.01040212, 5.80242044],
[ 0.09531018, 7.13894636, 5.91106761],
[ 0.09531018, 7.61928251, 5.80242044],
[ 0.09531018, 7.29580291, 6.01640103],
[-2.30258509, 12.43197678, 5.79331844],
[ 1.13140211, 9.53156118, 7.22336862],
[-2.30258509, 7.09431783, 5.79939564],
[ 0.09531018, 7.50444662, 5.82037962],
[ 0.09531018, 7.8184705, 5.82334171],
[ 0.09531018, 7.25212482, 5.91106761]])
model = MOF()
model.fit(data)
scores = model.decision_scores_
print(scores)
model.visualize()
Output
[0.34541068 0.11101711 0.07193073 0.07520904 1.51480377 0.94558894 0.27585581 0.06242823 0.2204504 0.02247725]
Mass-Ratio-Average-Absolute-Deviation Based Outlier Factor (MAOF)
This research extends the mass-ratio-variance outlier factor algorithm (MOF) by exploring other alternative statistical
dispersion beyond the traditional variance such as range, interquartile range, and average absolute deviation.
MAOF()
Initialize a model object
MAOF
Parameters :
Return :
self : object
object of MAOF model
MAOF.fit(Data, Window = 10000, Function_name = "AAD")
Fit data to
MAOF
model
Parameters :
Data : numpy array of shape (n_points, d_dimensions)
The input samples.
Window : integer (int)
number of points for each calculation.
default window size is 10000.
Function_name : string (str)
A type of statistical dispersion that use for scoring.
Function_name can be 'AAD','IQR', 'Range'.
default function is 'AAD'
Return :
self : object
fitted estimator
MAOF attributes
Attributes | Type | Details |
---|---|---|
MAOF.Data | numpy array of shape (n_points, d_dimensions) | input data for scoring |
MAOF.decision_scores_ | numpy array of shape (n_samples) | decision score for each point |
Sample usage
# This example demonstrates the usage of MAOF
from pymof import MAOF
import numpy as np
data = np.array([[-2.30258509, 7.01040212, 5.80242044],
[ 0.09531018, 7.13894636, 5.91106761],
[ 0.09531018, 7.61928251, 5.80242044],
[ 0.09531018, 7.29580291, 6.01640103],
[-2.30258509, 12.43197678, 5.79331844],
[ 1.13140211, 9.53156118, 7.22336862],
[-2.30258509, 7.09431783, 5.79939564],
[ 0.09531018, 7.50444662, 5.82037962],
[ 0.09531018, 7.8184705, 5.82334171],
[ 0.09531018, 7.25212482, 5.91106761]])
model = MAOF()
model.fit(data)
scores = model.decision_scores_
print(scores)
Output
[0.46904762 0.26202234 0.2191358 0.22355477 0.97854203 0.79770723 0.40823045 0.20513423 0.38110915 0.12616108]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pymof-0.3.0.tar.gz
.
File metadata
- Download URL: pymof-0.3.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c879f0f9b2c06627dd064031468bc33f0fa2c83e2ad17b574c578e5e2920471 |
|
MD5 | a75d9844ce1fc99566d1072c4cb41bc2 |
|
BLAKE2b-256 | 8d7db166ddf25e41ee25ec1288d18a43509bf16ab982df6610c3cc86506ad259 |
File details
Details for the file pymof-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: pymof-0.3.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a7df5dff9de6c973892f5d45acc999ea9085876cdbf0080813aa26acec12208 |
|
MD5 | 4f482acb6793684e6464a9692acc35f0 |
|
BLAKE2b-256 | 6066c679f6f28a372faa882fb75e23b36cc137e21f7162c841c20085de3c1d85 |