Skip to main content

Outlier identifiers functions package.

Project description

OutlierIdentifiers

In brief

This a Python package for 1D outlier identifier functions. If follows closely the Wolfram Language (WL) paclet [AAp1], the R package [AAp2], and the Raku package [AAp3].

Here is a Jupyter notebook with usage examples: "OutlierIdentifiers-guide.ipynb"; (Markdown version).


Installation

From PyPI.org:

python3 -m pip install OutlierIdentifiers

From GitHub:

python3 -m pip install git+https://github.com/antononcube/Python-packages.git#egg=OutlierIdentifiers\&subdirectory=OutlierIdentifiers

Usage examples

Load packages:

import numpy as np
import plotly.graph_objects as go

from OutlierIdentifiers import *

Generate a vector with random numbers:

np.random.seed(14)
vec = np.random.normal(loc=10, scale=20, size=30)
print(vec)
[ 41.02678223  11.58372049  13.47953057   8.55326868 -30.086588
  12.89355626 -20.02337245  14.22218902  -1.16410111  31.6905813
   6.27421752  10.2932275  -11.51138939  22.84504148   6.39326577
  22.40600507  26.21948669  25.55871733   5.25020644 -27.83824691
 -13.44243588  26.72413943  30.18546801  35.86198722  -0.98662331
  -9.6342573   28.29345516  27.46140757  10.44222283   9.91712833]

Plot the vector:

# Create a scatter plot with markers
fig = go.Figure(data=go.Scatter(y=vec, mode='markers'))

# Add labels and title
fig.update_layout(title='Vector of Numbers', xaxis_title='Index', yaxis_title='Value', template = "plotly_dark")

# Display the plot
fig.show()

Find outlier positions:

outlier_identifier(vec, identifier=hampel_identifier_parameters)
array([ True, False, False, False,  True, False,  True, False, False,
        True, False, False,  True, False, False, False, False, False,
       False,  True,  True, False, False,  True, False,  True, False,
       False, False, False])

Find outlier values:

outlier_identifier(vec, identifier=hampel_identifier_parameters, value = True)
array([ 41.02678223, -30.086588  , -20.02337245,  31.6905813 ,
       -11.51138939, -27.83824691, -13.44243588,  35.86198722,
        -9.6342573 ])

Find top outlier positions and values:

outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)))
array([ True, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False])
outlier_identifier(vec, identifier = lambda v: top_outliers(hampel_identifier_parameters(v)), value=True)
array([41.02678223, 31.6905813 , 35.86198722])

Find bottom outlier positions and values (using quartiles-based identifier):

outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)))
array([False, False, False, False,  True, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False,  True, False, False, False, False, False, False, False,
       False, False, False])
outlier_identifier(vec, identifier = lambda v: bottom_outliers(quartile_identifier_parameters(v)), value=True)
array([-30.086588  , -20.02337245, -27.83824691])

Here is another way to get the outlier values:

vec[pred]
array([-30.086588  , -20.02337245, -27.83824691])

The available outlier parameters functions are:

  • hampel_identifier_parameters
  • splus_quartile_identifier_parameters
  • quartile_identifier_parameters
[ f(vec) for f in (hampel_identifier_parameters, splus_quartile_identifier_parameters, quartile_identifier_parameters)]
[(-8.796653643076334, 30.822596969354976),
 (-37.649981209714, 64.27685968784428),
 (-14.46873856125025, 36.49468188752889)]

References

[AA1] Anton Antonov, "Outlier detection in a list of numbers", (2013), MathematicaForPrediction at WordPress.

[AAp1] Anton Antonov, OutlierIdentifiers WL paclet, (2023), Wolfram Language Paclet Repository.

[AAp2] Anton Antonov, OutlierIdentifiers R package, (2019), R-packages at GitHub/antononcube.

[AAp3] Anton Antonov, OutlierIdentifiers Raku package, (2022), GitHub/antononcube.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlieridentifiers-0.1.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

OutlierIdentifiers-0.1.2-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file outlieridentifiers-0.1.2.tar.gz.

File metadata

  • Download URL: outlieridentifiers-0.1.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for outlieridentifiers-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ae9736d79ef60f76f6fa21cf8db73b357a48011fa486c74a830d20e78492e708
MD5 8afe62cedcb16f42338b181905cffcf8
BLAKE2b-256 fc396f1405ed925d44c1126edb360bf1a5a5d942df30019273c2f27296083d16

See more details on using hashes here.

File details

Details for the file OutlierIdentifiers-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for OutlierIdentifiers-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 74a97360739c75c619b82caa1ec46f3e3cbd1f75af2756b92d9a3898ab3c284c
MD5 f733997bce544a1a31a89da8a19cb1b0
BLAKE2b-256 9759c77d4c512a51f52bac9e2210f16cf6ee76b93a1b0df34cef6f76b6a99a61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page