Skip to main content

IdentifyOutliers provides a combined functionality of scaling data using the standard scaler approach while also removing outliers based on a z-score threshold for panda's projects.

Project description

IdentifyOutliers

A Python package for efficient scaling and outlier handling of pandas DataFrames using the standard scaler approach, .

IdentifyOutliers is designed to provide a seamless experience in preprocessing pandas DataFrames by ensuring data normalization and outlier handling in one step.

Features

  • Data Scaling: Utilizes the standard scaler method for data normalization.
  • Outlier Detection: Provides an option to set a z-score threshold for outlier detection.
  • Multiple Outputs: Returns the original data, the scaled data without outliers, a separate DataFrame for detected outliers, and scaled outliers.

Installation

Install the package using pip:

pip install IdentifyOutliers

Usage

import pandas as pd
from IdentifyOutliers.CustomZscoreScaler import CustomZscoreScaler

# Sample DataFrame
data = {
    'A': [1, 2, 3, 100, 5],
    'B': [5, 6, 7, 8, 500]
}
df = pd.DataFrame(data)

# Initialize the scaler with a z-score threshold (default is 3.0)
scaler = CustomZscoreScaler(threshold=3.0)

# Transform the data
df_no_outliers, df_scaled_no_outliers, df_outliers, df_scaled_outliers = scaler.transform(df)

# Print the results
print(df_no_outliers)
#    A  B
# 0  1  5
# 1  2  6
# 2  3  7

print(df_scaled_no_outliers)
#           A         B
# 0 -0.544672 -0.507592
# 1 -0.518980 -0.502526
# 2 -0.493288 -0.497461

print(df_outliers)
#      A    B
# 3  100    8
# 4    5  500

print(df_outliers)
#      A    B
# 3  100    8
# 4    5  500

print(df_scaled_outliers)
#           A         B
# 3  1.998845 -0.492395
# 4 -0.441904  1.999974

Parameters

threshold: The z-score threshold for outlier detection. Data points exceeding threshold standard deviations away from the mean are considered outliers. The default value is 3.0.

Contributions

Contributions are welcome! Please create an issue or submit a pull request.

License

This project is licensed under the [MIT License] (https://github.com/amithpdn/IdentifyOutliers/blob/master/LICENSE.TXT).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

IdentifyOutliers-0.0.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

IdentifyOutliers-0.0.2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file IdentifyOutliers-0.0.2.tar.gz.

File metadata

  • Download URL: IdentifyOutliers-0.0.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for IdentifyOutliers-0.0.2.tar.gz
Algorithm Hash digest
SHA256 6c9cac8877ef94ac290d9626bb4900bfbf29554b70789a2546f2d8b157c387e4
MD5 f5144e3dffc7962fb4155004e6f4f663
BLAKE2b-256 67ae22657ffc0ad92b9a9215dda531c03132fae79b5890f91cf8bdc20d95bfe8

See more details on using hashes here.

File details

Details for the file IdentifyOutliers-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for IdentifyOutliers-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4a810ac1cb903a4ef01004ff7884bbf2c29af74093b5f4303936305d443dadab
MD5 44cc224c0df453112bc8a3b4356c1a43
BLAKE2b-256 548c0f53a07b906f58cf743699ecd0b41899d40ce410f421c756395c6c5dbdbf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page