Skip to main content

Mass ratio variance-based outlier factor (MOF)

Project description

pymof

Updated by Mr. Supakit Sroynam (6534467323@student.chula.ac.th) and Krung Sinapiromsaran (krung.s@chula.ac.th)
Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University
Version 0.2: 23 September 2024

Mass-ratio-variance based outlier factor

Latest news

  1. Documents are editted. 2D and 3D examples are added.
  2. Suggest the package library before importing.
  3. The package operates on fixed datasets containing fewer than 1000 data points.

Introduction

An outlier of a finite dataset in statistics is defined as a data point that differs significantly from others. It is normally surrounded by a few data points while normal ones are engulfed by others. This behavior leads to the proposed outlier factor called Mass-ratio-variance based Outlier Factor (MOF). A score is assigned to a data point from the variance of the mass-ratio distribution from the rest of data points. Within a sphere of an outlier, there will be few data points compared with a normal one. So, the mass-ratio of an outlier will be different from that of a normal data point. The algorithm to generate MOF requires no parameter and embraces the density concept.

Citation

If you use this package in your research, please consider citing the below papers.

BibTex for the package:

@inproceedings{changsakul2021mass,
  title={Mass-ratio-variance based Outlier Factor},
  author={Changsakul, Phichapop and Boonsiri, Somjai and Sinapiromsaran, Krung},
  booktitle={2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)},
  pages={1--5},
  year={2021},
  organization={IEEE}
}

Installation

To install pymof, type the following command in the terminal

pip install pymof            # normal install
pip install --upgrade pymof  # or update if needed

Use on jupyter notebook

To make sure that the installed package can be called. A user must include the package path before import as

import sys
sys.path.append('/path/to/lib/python3.xx/site-packages')

Required Dependencies :

  • Python 3.9 or higher
  • numpy>=1.23
  • numba>=0.56.0
  • scipy>=1.8.0
  • scikit-learn>=1.2.0
  • matplotlib>=3.5

Documentation


The outlier score of each data point is calculated using the Mass-ratio-variance based Outlier Factor (MOF). MOF quantifies the global deviation of a data point's density relative to the rest of the dataset. This global perspective is crucial because an outlier's score depends on its overall isolation from all other data points. By analyzing the variance of the mass ratio, MOF can effectively identify data points with significantly lower density compared to their neighbors, indicating their outlier status.

MOF()

Initialize a model object MOF

Parameters :
Return :
        self : object
                object of MOF model

MOF.fit(Data)

Fit data to MOF model
Note The number of data points should not exceed 10000 due to the computation of all pair distances.

Parameters :
        Data  : numpy array of shape (n_points, d_dimensions)
                The input samples.
Return :
        self  : object
                fitted estimator

MOF.visualize()

Visualize data points with MOF's scores
Note cannot visualize data points having a dimension greather than 3

Parameters :
Return :
    decision_scores_ : numpy array of shape (n_samples)
                                decision score for each point

MOF attributes

Attributes Type Details
MOF.Data numpy array of shape (n_points, d_dimensions) input data for scoring
MOF.MassRatio numpy array of shape (n_samples, n_points) MassRatio for each pair of data points
MOF.decision_scores_ numpy array of shape (n_samples) decision score for each point

Sample usage

# This example is from MOF paper.
import matplotlib.pyplot as plt
data = np.array([[0.0, 1.0], [1.0, 1.0], [2.0, 1.0], [3.0, 1.0],
                 [0.0, 0.0], [1.0, 0.0], [2.0, 0.0], [3.0, 0.0],
                 [0.0,-1.0], [1.0,-1.0], [2.0,-1.0], [3.0,-1.0], [8.0, 4.0]
                ])
model = MOF()
model.fit(data)
scores = model.decision_scores_
print(scores)
model.visualize()

# Create a figure and axes
fig, ax = plt.subplots()
data = model.MassRatio
# Iterate over each row and create a boxplot
for i in range(data.shape[0]):
    row = data[i, :]
    mask = np.isnan(row)
    ax.boxplot(row[~mask], positions=[i + 1], vert=False, widths=0.5)
# Set labels and title
ax.set_xlabel("MOF")
ax.set_ylabel("Data points")
ax.set_title("Boxplot of MassRatio distribution")
# Show the plot
plt.grid(True)
plt.show()

Output

[0.12844997, 0.06254347, 0.08142683, 0.20940997, 0.03981233, 0.0212412 , 0.025438  , 0.08894882, 0.11300615, 0.0500218, 0.05805704, 0.17226989, 2.46193377]

MOF score Box plot of MassRatio distribution

3D sample

# This example demonstrates  the usage of MOF
from pymof import MOF
import numpy as np
data = np.array([[-2.30258509,  7.01040212,  5.80242044],
                 [ 0.09531018,  7.13894636,  5.91106761],
                 [ 0.09531018,  7.61928251,  5.80242044],
                 [ 0.09531018,  7.29580291,  6.01640103],
                 [-2.30258509, 12.43197678,  5.79331844],
                 [ 1.13140211,  9.53156118,  7.22336862],
                 [-2.30258509,  7.09431783,  5.79939564],
                 [ 0.09531018,  7.50444662,  5.82037962],
                 [ 0.09531018,  7.8184705,   5.82334171],
                 [ 0.09531018,  7.25212482,  5.91106761]])
model = MOF()
model.fit(data)
scores = model.decision_scores_
print(scores)
model.visualize()

Output

[0.34541068 0.11101711 0.07193073 0.07520904 1.51480377 0.94558894 0.27585581 0.06242823 0.2204504  0.02247725]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymof-0.2.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

pymof-0.2.2-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file pymof-0.2.2.tar.gz.

File metadata

  • Download URL: pymof-0.2.2.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pymof-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4e67290a646b1d20b54c8d6b8b21a64c439f399b8290833d7c8dc8bddf7cdfc1
MD5 c829bf58df7a4f3a3c0fbdeef1fdf694
BLAKE2b-256 2ba11e49c1db3c83991fd644493561106e3ad4ce8c94e41510fb7114cc363b71

See more details on using hashes here.

File details

Details for the file pymof-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: pymof-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pymof-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a44e2af7cd6f8bed9d69a99f5aad28e0b983eaf7d61e34f30e35e5bd8f3784e4
MD5 55ba1862f7ebc98e6d162a666c009848
BLAKE2b-256 bf4b8f52b0bad99a8cfc75e5aceee6364071a34576e721bb90480fb45d45509b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page