Utility library for detecting and removing outliers from normally distributed datasets
Project description
Utility library for detecting and removing outliers from normally distributed datasets using the Smirnov-Grubbs test.
Requirements
Overview
Both the two-sided and the one-sided version of the test are supported. The former allows extracting outliers from both ends of the dataset, whereas the latter only considers min/max outliers. When running a test, every outlier will be removed until none can be found in the dataset. The output of the test is flexible enough to match several use cases. By default, the outlier-free data will be returned, but the test can also return the outliers themselves or their indices in the original dataset.
Examples
Two-sided Grubbs test with a Pandas series input
>>> from outliers import smirnov_grubbs as grubbs >>> import pandas as pd >>> data = pd.Series([1, 8, 9, 10, 9]) >>> grubbs.test(data, alpha=0.05) 1 8 2 9 3 10 4 9 dtype: int64
Two-sided Grubbs test with a NumPy array input
>>> import numpy as np >>> data = np.array([1, 8, 9, 10, 9]) >>> grubbs.test(data, alpha=0.05) array([ 8, 9, 10, 9])
One-sided (min) test returning outlier indices
>>> grubbs.min_test_indices([8, 9, 10, 1, 9], alpha=0.05) [3]
One-sided (max) tests returning outliers
>>> grubbs.max_test_outliers([8, 9, 10, 1, 9], alpha=0.05) [] >>> grubbs.max_test_outliers([8, 9, 10, 50, 9], alpha=0.05) [50]
License
This software is licensed under the MIT License.
CHANGES
0.0.3 (2016-04-25)
Thanks to @lukius .
Support for one-sided (min/max) tests.
Test output is now more flexible: the user can run the test in order to find the outliers themselves or the indices of the outliers, and not just the outlier-free data.
Test suite was enhanced.
README was extended and improved.
Japanese comments were translated to English so as to reach a greater audience.
0.0.2 (2015-12-02)
Update setup.py
0.0.1 (2015-12-01)
Publish to pypi
0.0.0 (2015-07-28)
Create this project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for outlier_utils-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f91b7a69dc5895b258ea307ed75e7bb3d901fba492c5a7cde4ae2e9cece206f7 |
|
MD5 | 40decb3260f9ff94ebfd55a075579c93 |
|
BLAKE2b-256 | b1b6686a53fbf3fb4c6fd912cca01999d9e9da90a83756040223a16afd2ed578 |
Hashes for outlier_utils-0.0.3-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d99e48033e41a8ac00c2467f61725f76c04327543d9a895b92bb16d2ded4629 |
|
MD5 | 0294553d36eda2c66dd4834d2ed2cc9e |
|
BLAKE2b-256 | b4046ac1f8c45f22052bc973b2a73a27d2f7556ae33476060105de6241b13f1a |