IdentifyOutliers provides a combined functionality of scaling data using the standard scaler approach while also removing outliers based on a z-score threshold for panda's projects.
Project description
IdentifyOutliers
A Python package for efficient scaling and outlier handling of pandas DataFrames using the standard scaler approach, .
IdentifyOutliers
is designed to provide a seamless experience in preprocessing pandas DataFrames by ensuring data normalization and outlier handling in one step.
Features
- Data Scaling: Utilizes the standard scaler method for data normalization.
- Outlier Detection: Provides an option to set a z-score threshold for outlier detection.
- Multiple Outputs: Returns the original data, the scaled data without outliers, a separate DataFrame for detected outliers, and scaled outliers.
Installation
Install the package using pip:
pip install IdentifyOutliers
Usage
import pandas as pd
from IdentifyOutliers.CustomZscoreScaler import CustomZscoreScaler
# Sample DataFrame
data = {
'A': [1, 2, 3, 100, 5],
'B': [5, 6, 7, 8, 500]
}
df = pd.DataFrame(data)
# Initialize the scaler with a z-score threshold (default is 3.0)
scaler = CustomZscoreScaler(threshold=3.0)
# Transform the data
df_no_outliers, df_scaled_no_outliers, df_outliers, df_scaled_outliers = scaler.transform(df)
# Print the results
print(df_no_outliers)
# A B
# 0 1 5
# 1 2 6
# 2 3 7
print(df_scaled_no_outliers)
# A B
# 0 -0.544672 -0.507592
# 1 -0.518980 -0.502526
# 2 -0.493288 -0.497461
print(df_outliers)
# A B
# 3 100 8
# 4 5 500
print(df_outliers)
# A B
# 3 100 8
# 4 5 500
print(df_scaled_outliers)
# A B
# 3 1.998845 -0.492395
# 4 -0.441904 1.999974
Parameters
threshold: The z-score threshold for outlier detection. Data points exceeding threshold standard deviations away from the mean are considered outliers. The default value is 3.0.
Contributions
Contributions are welcome! Please create an issue or submit a pull request.
License
This project is licensed under the [MIT License] (https://github.com/amithpdn/IdentifyOutliers/blob/master/LICENSE.TXT).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file IdentifyOutliers-0.0.2.tar.gz
.
File metadata
- Download URL: IdentifyOutliers-0.0.2.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c9cac8877ef94ac290d9626bb4900bfbf29554b70789a2546f2d8b157c387e4 |
|
MD5 | f5144e3dffc7962fb4155004e6f4f663 |
|
BLAKE2b-256 | 67ae22657ffc0ad92b9a9215dda531c03132fae79b5890f91cf8bdc20d95bfe8 |
File details
Details for the file IdentifyOutliers-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: IdentifyOutliers-0.0.2-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a810ac1cb903a4ef01004ff7884bbf2c29af74093b5f4303936305d443dadab |
|
MD5 | 44cc224c0df453112bc8a3b4356c1a43 |
|
BLAKE2b-256 | 548c0f53a07b906f58cf743699ecd0b41899d40ce410f421c756395c6c5dbdbf |