Skip to main content

Utility library for detecting and removing outliers from normally distributed datasets

Project description

https://travis-ci.org/c-bata/outlier-utils.svg?branch=master

Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.

Requirements

Overview

Both the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.

Examples

  • Two-sided Grubbs test with a Pandas series input
>>> from outliers import smirnov_grubbs as grubbs
>>> import pandas as pd
>>> data = pd.Series([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
1     8
2     9
3    10
4     9
dtype: int64
  • Two-sided Grubbs test with a NumPy array input
>>> import numpy as np
>>> data = np.array([1, 8, 9, 10, 9])
>>> grubbs.test(data, alpha=0.05)
array([ 8,  9, 10,  9])
  • One-sided (min) test returning outlier indices
>>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05)
[3]
  • One-sided (max) tests returning outliers
>>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05)
[]
>>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05)
[50]

License

This software is licensed under the MIT License.

CHANGES

0.0.3 (2016-04-25)

Thanks to @lukius .

  • Support for one-sided (min/max) tests.
  • Test output is now more flexible: the user can run the test in order to find the outliers themselves or the indices of the outliers, and not just the outlier-free data.
  • Test suite was enhanced.
  • README was extended and improved.
  • Japanese comments were translated to English so as to reach a greater audience.

0.0.2 (2015-12-02)

Update setup.py

0.0.1 (2015-12-01)

Publish to pypi

0.0.0 (2015-07-28)

Create this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for outlier_utils, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size outlier_utils-0.0.3-py2-none-any.whl (6.0 kB) File type Wheel Python version py2 Upload date Hashes View
Filename, size outlier_utils-0.0.3-py3-none-any.whl (6.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size outlier_utils-0.0.3.tar.gz (5.2 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page