Filtering outliers
Project description
au: Outlier Detection Toolkit
Filtering outliers to find the golden nuggets that standout from the rest.
To install: pip install au
Outlier detection is a fundamental step in data analysis, particularly relevant in statistics, data mining, and machine learning. This toolkit provides a set of functions and classes in Python for identifying outliers - observations in data that are significantly different from the majority. The toolkit is designed to accommodate various methodologies, ranging from statistical methods to machine learning-based approaches.
Features
-
Z-Score Based Outlier Detection
- Detects outliers by measuring how many standard deviations an element is from the mean.
- Suitable for datasets where the distribution is expected to be Gaussian.
-
Interquartile Range (IQR) Based Outlier Detection
- Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
- Effective for skewed distributions.
-
Isolation Forest Based Outlier Detection
- Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
- Ideal for high-dimensional datasets.
Installation
Ensure that you have Python installed on your system. This toolkit requires numpy
and scikit-learn
. They can be installed via pip.
pip install numpy scikit-learn
Features
-
Z-Score Based Outlier Detection
- Detects outliers by measuring how many standard deviations an element is from the mean.
- Suitable for datasets where the distribution is expected to be Gaussian.
-
Interquartile Range (IQR) Based Outlier Detection
- Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
- Effective for skewed distributions.
-
Isolation Forest Based Outlier Detection
- Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
- Ideal for high-dimensional datasets.
Installation
Ensure that you have Python installed on your system. This toolkit requires numpy
and scikit-learn
. They can be installed via pip:
pip install numpy scikit-learn
Usage
-
Z-Score Based Outlier Detection
from outlier_detection import detect_outliers_zscore outliers = detect_outliers_zscore([10, 12, 12, 13, 12, 11, 40])
-
Interquartile Range (IQR) Based Outlier Detection
from outlier_detection import detect_outliers_iqr outliers = detect_outliers_iqr([10, 12, 12, 13, 12, 11, 40])
-
Isolation Forest Based Outlier Detection
from outlier_detection import IsolationForestOutlierDetector detector = IsolationForestOutlierDetector() outliers = detector.detect_outliers([10, 12, 12, 13, 12, 11, 40])
Documentation
Each function and class in this toolkit comes with a detailed docstring, explaining its purpose, parameters, return values, and examples.
Contributing
Contributions to this project are welcome! Please fork the repository and submit a pull request with your changes.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file au-0.0.7.tar.gz
.
File metadata
- Download URL: au-0.0.7.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bca38d5ca7bdb687fb5d97646bf7d3c5504ac1b3e960c2b73a84c6b3b960a3af |
|
MD5 | df94e476e6dfa91426276125a9fc8482 |
|
BLAKE2b-256 | 7738c518dbafb6f9736a1337cf030e142b0cc00af02f1f6939a6e1a6f77975bd |
File details
Details for the file au-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: au-0.0.7-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c6a14e8702206c16add0ca72465c66922d2bed6cb041ef4c6640237b6cf8fb1 |
|
MD5 | 9c7da6231df76d6be8e530eb66e11a42 |
|
BLAKE2b-256 | 635b9669bf754d17e6545946c11eee12b9c6eb1d614d2c8eefa27f96749df161 |