Project description

cleanPyData

An awesome Python package for data cleaning and preprocessing.

cleanPyData is a robust and easy-to-use Python package designed to streamline the data cleaning and preprocessing phase of your data science projects. It provides essential functionalities for handling missing values, normalizing data, extracting important features, and detecting outliers, ensuring that your data is ready for analysis or machine learning.

Features

Handle Missing Values: Clean your dataset by filling in missing values using various strategies (mean, median, mode) or by dropping them entirely.
Normalize Data: Apply normalization techniques such as Min-Max scaling and Z-score normalization to standardize your data.
Feature Extraction: Select the most important features from your dataset to improve model performance and reduce overfitting.
Outlier Detection: Detect and remove outliers using methods like Z-score and Interquartile Range (IQR).

Built With

Python
Pandas
NumPy

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Python 3.x

Installation

pip install cleanPyData

Usage

import pandas as pd
from cleanPyData import (
    handle_missing_values,
    normalize_data,
    extract_features,
    detect_outliers
)

# Example DataFrame
data = {
    'A': [1, 2, None, 4],
    'B': [4, None, 6, 8],
    'C': [5, 6, 7, 8],
    'target': [1, 0, 1, 0]
}
df = pd.DataFrame(data)

# Handle missing values
df = handle_missing_values(df, strategy='mean')

# Normalize data
df = normalize_data(df, method='minmax')

# Extract features
df = extract_features(df, target='target', k=2)

# Detect outliers
df = detect_outliers(df, method='zscore', threshold=3)

print(df)

Roadmap

Add more cleaning and preprocessing functionalities
Improve performance and efficiency
Add more examples and documentation

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See MIT License for more information.

Contact

Kaddu Livingstone - kaddulivingston@gmail.com

Project Link: https://github.com/Livingston-k/cleanPyData

Follow Me

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

0.1.8

May 25, 2024

This version

0.1.7

May 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanpydata-0.1.7.tar.gz (5.0 kB view hashes)

Uploaded May 25, 2024 Source

Built Distribution

cleanPyData-0.1.7-py3-none-any.whl (7.0 kB view hashes)

Uploaded May 25, 2024 Python 3

Hashes for cleanpydata-0.1.7.tar.gz

Hashes for cleanpydata-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`3c5ceb755911fa7bb488d2659c229c4f994b4f55b441522ec40d3eec4fd460a1`
MD5	`afb165c480942bb3cbb7e7b5c91c62e1`
BLAKE2b-256	`534b474df9456db29a039a7f8769482ef91e5aaffb3d06033a6ee1b217feb2d0`

Hashes for cleanPyData-0.1.7-py3-none-any.whl

Hashes for cleanPyData-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba687fc775d137c0722476d45c3b2c8586b216480c33056dcc638d67e3898af3`
MD5	`ef1f4f2be1465bbcc278309d5758fc00`
BLAKE2b-256	`cf6b18b0e4f2601f38cc757e5c4bbf774bfad17de815f3a0afd9992d59ef3910`