A comprehensive data preprocessing library for Python
Project description
PreprocessingLib PreprocessingLib is a Python library designed to facilitate data preprocessing steps. It provides various classes and functions to automate the process of cleaning, transforming, and engineering features in datasets.
Features
- Missing Value Handling Detect missing values in a dataset. Fill missing values using mean, median, or a constant value. Remove rows or columns with missing values.
- Feature Engineering Create new features based on existing ones.
- Date and Time Handling Extract features like year, month, day, and day of the week from datetime columns.
- Data Type Conversion Convert columns to numeric or categorical data types.
- Categorical Encoding Perform one-hot encoding or label encoding on categorical variables.
- Outlier Handling Detect outliers in numerical data. Handle outliers by removing or replacing them.
- Data Scaling Standardize or normalize numerical data.
- Text Cleaning Clean text data by removing punctuation, stop words, and lemmatizing words. Installation You can install PreprocessingLib using pip:
pip install preprocessinglib Usage Here's how you can use PreprocessingLib in your Python projects:
from mypreprocessinglib import FeatureEngineer, MissingValueHandler, DateTimeHandler, DataTypeConverter, CategoricalEncoder, OutlierHandler, Scaler, TextCleaner import pandas as pd
Load sample dataset
data = pd.read_csv("sample_dataset.csv")
Example usage of preprocessing functions
missing_handler = MissingValueHandler() filled_data = missing_handler.fill_missing_values(data)
data_with_new_features = FeatureEngineer.create_new_features(data, column1='Column1', column2='Column2')
date_with_features = DateTimeHandler.extract_date_features(data, column='DateColumn')
numeric_data = DataTypeConverter.convert_to_numeric(data, columns=['Column1', 'Column2'])
encoded_data = CategoricalEncoder.one_hot_encode(data, columns=['CategoricalColumn'])
outliers_removed_data = OutlierHandler.handle_outliers(data, method='drop')
scaled_data = Scaler.standardize_data(data)
cleaned_text = TextCleaner.clean_text("example text") Testing You can run the unit tests to ensure the proper functioning of the library:
python -m unittest test_data_preprocessing.py Contributing Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.
License This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for preprocessinglib_tonga_gumustakim-0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a7be72503f079f3d32644e199dc85dcbde6a51aa19ac3bb4a3cbd8cdbecb561 |
|
MD5 | 514f589685a10ff0825ca4184140d706 |
|
BLAKE2b-256 | ee8820121eefb157fa69b219918b4c4337f5924e4dc50f8263277a7f86e947fe |
Hashes for preprocessinglib_tonga_gumustakim-0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a6d1259d6e378fe91633f88575da8f722c9ca9800fc033f0ca1887716127b51 |
|
MD5 | bd0d940e82adfa501983dfe7bd25a4b5 |
|
BLAKE2b-256 | 55ba0af853e0505c488dc448e0d7cb0884fd73d26b37874f7f7c8a61b9a917ae |