Project description

PreprocessingLib PreprocessingLib is a Python library designed to facilitate data preprocessing steps. It provides various classes and functions to automate the process of cleaning, transforming, and engineering features in datasets.

Features

Missing Value Handling Detect missing values in a dataset. Fill missing values using mean, median, or a constant value. Remove rows or columns with missing values.
Feature Engineering Create new features based on existing ones.
Date and Time Handling Extract features like year, month, day, and day of the week from datetime columns.
Data Type Conversion Convert columns to numeric or categorical data types.
Categorical Encoding Perform one-hot encoding or label encoding on categorical variables.
Outlier Handling Detect outliers in numerical data. Handle outliers by removing or replacing them.
Data Scaling Standardize or normalize numerical data.
Text Cleaning Clean text data by removing punctuation, stop words, and lemmatizing words. Installation You can install PreprocessingLib using pip:

pip install preprocessinglib Usage Here's how you can use PreprocessingLib in your Python projects:

from mypreprocessinglib import FeatureEngineer, MissingValueHandler, DateTimeHandler, DataTypeConverter, CategoricalEncoder, OutlierHandler, Scaler, TextCleaner import pandas as pd

Load sample dataset

data = pd.read_csv("sample_dataset.csv")

Example usage of preprocessing functions

missing_handler = MissingValueHandler() filled_data = missing_handler.fill_missing_values(data)

data_with_new_features = FeatureEngineer.create_new_features(data, column1='Column1', column2='Column2')

date_with_features = DateTimeHandler.extract_date_features(data, column='DateColumn')

numeric_data = DataTypeConverter.convert_to_numeric(data, columns=['Column1', 'Column2'])

encoded_data = CategoricalEncoder.one_hot_encode(data, columns=['CategoricalColumn'])

outliers_removed_data = OutlierHandler.handle_outliers(data, method='drop')

scaled_data = Scaler.standardize_data(data)

cleaned_text = TextCleaner.clean_text("example text") Testing You can run the unit tests to ensure the proper functioning of the library:

python -m unittest test_data_preprocessing.py Contributing Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

License This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

This version

0.5

May 24, 2024

0.4

May 23, 2024

0.3

May 23, 2024

0.2

May 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocessinglib_tonga_gumustakim-0.5.tar.gz (6.7 kB view hashes)

Uploaded May 24, 2024 Source

Built Distribution

preprocessinglib_tonga_gumustakim-0.5-py3-none-any.whl (8.5 kB view hashes)

Uploaded May 24, 2024 Python 3

Hashes for preprocessinglib_tonga_gumustakim-0.5.tar.gz

Hashes for preprocessinglib_tonga_gumustakim-0.5.tar.gz
Algorithm	Hash digest
SHA256	`0a5b4f808c273f45b5d86e7dbdc782ec972b27c11edad029ccb2870a758c209d`
MD5	`39694c7468a5e88a114517f286f7cd47`
BLAKE2b-256	`065efc99f92b1e4361edb8745285635395c35fd44a37d12317f474c06657f00a`

Hashes for preprocessinglib_tonga_gumustakim-0.5-py3-none-any.whl

Hashes for preprocessinglib_tonga_gumustakim-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`febde82e84e6a2c4173120af7534782a67b9e70fcce827977c2e2d352b5c4933`
MD5	`88a3596d690ced9ba112b24cd75f649e`
BLAKE2b-256	`2bdb819cac21d9cb95752e6828a07a37f4c60428ca595737a8c6b6ac5d838457`