Skip to main content

Utility library for detecting and removing outliers from normally distributed datasets

Project description

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.

Requirements

Overview

Both the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.

Examples

  • Two-sided Grubbs test with a Pandas series input

>>> from outliers import smirnov_grubbs as grubbs
>>> import pandas as pd
>>> data = pd.Series([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
1     8
2     9
3    10
4     9
dtype: int64
  • Two-sided Grubbs test with a NumPy array input

>>> import numpy as np
>>> data = np.array([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
array([ 8,  9, 10,  9])
  • One-sided (min) test returning outlier indices

>>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05)
[3]
  • One-sided (max) tests returning outliers

>>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05)
[]
>>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05)
[50]

License

This software is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlier_utils-0.0.5.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

outlier_utils-0.0.5-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file outlier_utils-0.0.5.tar.gz.

File metadata

  • Download URL: outlier_utils-0.0.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for outlier_utils-0.0.5.tar.gz
Algorithm Hash digest
SHA256 16e46fa6f7b01fe5518ea73fc15d3de0e30091750c428760bbe7dde2c9590579
MD5 7018000d4a64e8ea0b96a0a0d45e130b
BLAKE2b-256 293a73493f0d4ee662798b27b0287d4372d99d3339ba5c3801caa14d5bf4d26d

See more details on using hashes here.

File details

Details for the file outlier_utils-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for outlier_utils-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 2e16148a3fa7b2e16ad0a3b75d8c8920828b5cc11568795782d597d4cfb0b194
MD5 fc28198aec5a8d9fd722bbd896e3c725
BLAKE2b-256 5702281e0d898e50138b4275d8f2840d5b6bd41b276cb697dd56fd56ac91262c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page