Skip to main content

A package for data cleaning and preprocessing

Project description

cleanPyData

An awesome Python package for data cleaning and preprocessing.

cleanPyData is a robust and easy-to-use Python package designed to streamline the data cleaning and preprocessing phase of your data science projects. It provides essential functionalities for handling missing values, normalizing data, extracting important features, and detecting outliers, ensuring that your data is ready for analysis or machine learning.

Features

  • Handle Missing Values: Clean your dataset by filling in missing values using various strategies (mean, median, mode) or by dropping them entirely.
  • Normalize Data: Apply normalization techniques such as Min-Max scaling and Z-score normalization to standardize your data.
  • Feature Extraction: Select the most important features from your dataset to improve model performance and reduce overfitting.
  • Outlier Detection: Detect and remove outliers using methods like Z-score and Interquartile Range (IQR).

Built With

  • Python
  • Pandas
  • NumPy
  • sklearn

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

  • Python 3.x

Installation

pip install cleanPyData

Usage

import pandas as pd
from cleanPyData import (
    handle_missing_values,
    normalize_data,
    extract_features,
    detect_outliers
)

# Example DataFrame
data = {
    'A': [1, 2, None, 4],
    'B': [4, None, 6, 8],
    'C': [5, 6, 7, 8],
    'target': [1, 0, 1, 0]
}
df = pd.DataFrame(data)

# Handle missing values
df = handle_missing_values(df, strategy='mean')

# Normalize data
df = normalize_data(df, method='minmax')

# Extract features
df = extract_features(df, target='target', k=2)

# Detect outliers
df = detect_outliers(df, method='zscore', threshold=3)

print(df)

Roadmap

  • Add more cleaning and preprocessing functionalities
  • Improve performance and efficiency
  • Add more examples and documentation

Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See MIT License for more information.

Contact

Kaddu Livingstone - kaddulivingston@gmail.com

Project Link: https://github.com/Livingston-k/cleanPyData

Follow Me

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanpydata-0.1.8.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleanPyData-0.1.8-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file cleanpydata-0.1.8.tar.gz.

File metadata

  • Download URL: cleanpydata-0.1.8.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.0

File hashes

Hashes for cleanpydata-0.1.8.tar.gz
Algorithm Hash digest
SHA256 cb776662682c463f21267c841c06e27c35fff7adbaa005259312868864068854
MD5 074cb4c29d8c154c59ca326422448cf0
BLAKE2b-256 636fdfda59d6247b4c924e765dfd8950a15bed38da14ccfe033a42eaa2369537

See more details on using hashes here.

File details

Details for the file cleanPyData-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: cleanPyData-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.0

File hashes

Hashes for cleanPyData-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 390cc40293dcff6298c5fbd3c5a50226fa064518a19d0f58de77fae96a24edd0
MD5 caadac5e04a339814bbf7f068b015f3b
BLAKE2b-256 7f0f3ba7b707c1591ab21bfa00fc5a6c47fc954481ed42ad9e317f3501b1d636

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page