Skip to main content

A comprehensive data preprocessing library for Python

Project description

PreprocessingLib PreprocessingLib is a Python library designed to facilitate data preprocessing steps. It provides various classes and functions to automate the process of cleaning, transforming, and engineering features in datasets.

Features

  1. Missing Value Handling Detect missing values in a dataset. Fill missing values using mean, median, or a constant value. Remove rows or columns with missing values.
  2. Feature Engineering Create new features based on existing ones.
  3. Date and Time Handling Extract features like year, month, day, and day of the week from datetime columns.
  4. Data Type Conversion Convert columns to numeric or categorical data types.
  5. Categorical Encoding Perform one-hot encoding or label encoding on categorical variables.
  6. Outlier Handling Detect outliers in numerical data. Handle outliers by removing or replacing them.
  7. Data Scaling Standardize or normalize numerical data.
  8. Text Cleaning Clean text data by removing punctuation, stop words, and lemmatizing words. Installation You can install PreprocessingLib using pip:

pip install preprocessinglib Usage Here's how you can use PreprocessingLib in your Python projects:

from mypreprocessinglib import FeatureEngineer, MissingValueHandler, DateTimeHandler, DataTypeConverter, CategoricalEncoder, OutlierHandler, Scaler, TextCleaner import pandas as pd

Load sample dataset

data = pd.read_csv("sample_dataset.csv")

Example usage of preprocessing functions

missing_handler = MissingValueHandler() filled_data = missing_handler.fill_missing_values(data)

data_with_new_features = FeatureEngineer.create_new_features(data, column1='Column1', column2='Column2')

date_with_features = DateTimeHandler.extract_date_features(data, column='DateColumn')

numeric_data = DataTypeConverter.convert_to_numeric(data, columns=['Column1', 'Column2'])

encoded_data = CategoricalEncoder.one_hot_encode(data, columns=['CategoricalColumn'])

outliers_removed_data = OutlierHandler.handle_outliers(data, method='drop')

scaled_data = Scaler.standardize_data(data)

cleaned_text = TextCleaner.clean_text("example text") Testing You can run the unit tests to ensure the proper functioning of the library:

python -m unittest test_data_preprocessing.py Contributing Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or submit a pull request.

License This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocessinglib_tonga_gumustakim-0.4.tar.gz (6.5 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page