Skip to main content

Filtering outliers

Project description

au: Outlier Detection Toolkit

Filtering outliers to find the golden nuggets that standout from the rest.

To install: pip install au

Outlier detection is a fundamental step in data analysis, particularly relevant in statistics, data mining, and machine learning. This toolkit provides a set of functions and classes in Python for identifying outliers - observations in data that are significantly different from the majority. The toolkit is designed to accommodate various methodologies, ranging from statistical methods to machine learning-based approaches.

Features

  1. Z-Score Based Outlier Detection

    • Detects outliers by measuring how many standard deviations an element is from the mean.
    • Suitable for datasets where the distribution is expected to be Gaussian.
  2. Interquartile Range (IQR) Based Outlier Detection

    • Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
    • Effective for skewed distributions.
  3. Isolation Forest Based Outlier Detection

    • Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
    • Ideal for high-dimensional datasets.

Installation

Ensure that you have Python installed on your system. This toolkit requires numpy and scikit-learn. They can be installed via pip.

pip install numpy scikit-learn

Features

  1. Z-Score Based Outlier Detection

    • Detects outliers by measuring how many standard deviations an element is from the mean.
    • Suitable for datasets where the distribution is expected to be Gaussian.
  2. Interquartile Range (IQR) Based Outlier Detection

    • Utilizes the IQR, which is the difference between the 75th and 25th percentile of the data.
    • Effective for skewed distributions.
  3. Isolation Forest Based Outlier Detection

    • Implements the Isolation Forest algorithm, a machine learning method for anomaly detection.
    • Ideal for high-dimensional datasets.

Installation

Ensure that you have Python installed on your system. This toolkit requires numpy and scikit-learn. They can be installed via pip:

pip install numpy scikit-learn

Usage

  1. Z-Score Based Outlier Detection

    from outlier_detection import detect_outliers_zscore
    
    outliers = detect_outliers_zscore([10, 12, 12, 13, 12, 11, 40])
    
  2. Interquartile Range (IQR) Based Outlier Detection

    from outlier_detection import detect_outliers_iqr
    
    outliers = detect_outliers_iqr([10, 12, 12, 13, 12, 11, 40])
    
  3. Isolation Forest Based Outlier Detection

    from outlier_detection import IsolationForestOutlierDetector
    
    detector = IsolationForestOutlierDetector()
    outliers = detector.detect_outliers([10, 12, 12, 13, 12, 11, 40])
    

Documentation

Each function and class in this toolkit comes with a detailed docstring, explaining its purpose, parameters, return values, and examples.

Contributing

Contributions to this project are welcome! Please fork the repository and submit a pull request with your changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

au-0.0.7.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

au-0.0.7-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file au-0.0.7.tar.gz.

File metadata

  • Download URL: au-0.0.7.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for au-0.0.7.tar.gz
Algorithm Hash digest
SHA256 bca38d5ca7bdb687fb5d97646bf7d3c5504ac1b3e960c2b73a84c6b3b960a3af
MD5 df94e476e6dfa91426276125a9fc8482
BLAKE2b-256 7738c518dbafb6f9736a1337cf030e142b0cc00af02f1f6939a6e1a6f77975bd

See more details on using hashes here.

File details

Details for the file au-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: au-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for au-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9c6a14e8702206c16add0ca72465c66922d2bed6cb041ef4c6640237b6cf8fb1
MD5 9c7da6231df76d6be8e530eb66e11a42
BLAKE2b-256 635b9669bf754d17e6545946c11eee12b9c6eb1d614d2c8eefa27f96749df161

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page