Skip to main content

A recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.

Project description

RDATAPP: Recoded Data Preprocessing Library

PyPI version Python versions License

Overview

RDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.

Features

  • Text Cleaning: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.
  • Missing Value Handling: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.
  • Encoding: One-hot encode and label encode categorical columns.
  • Outlier Detection: Detect and remove outliers using the Interquartile Range (IQR) method.
  • Scaling: Apply Min-Max scaling and standard scaling to numerical columns.
  • Feature Engineering: Create new features by applying functions to existing columns.
  • Date-Time Handling: Convert columns to datetime format and extract date parts like year, month, and day.

Installation

You can install RDATAPP from PyPI using pip:

pip install rdatapp

Usage

Below are examples of how to use the different classes and methods provided by RDATAPP.

Text Cleaning

from rdatapp.text_cleaning import TextCleaner

text_cleaner = TextCleaner()
cleaned_text = text_cleaner.clean_text("This is a Sample TEXT, with Punctuation!")
print(cleaned_text)

Missing Value Handling

import pandas as pd
from rdatapp.missing_value_handler import MissingValueHandler

df = pd.DataFrame({'A': [1, 2, None, 4]})
df = MissingValueHandler.impute_mean(df, 'A')
print(df)

Encoding

import pandas as pd
from rdatapp.categorical_encoder import CategoricalEncoder

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
df = CategoricalEncoder.one_hot_encode(df, 'Category')
print(df)

Outlier Detection

import pandas as pd
from rdatapp.outlier_handler import OutlierHandler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})
df = OutlierHandler.iqr_outlier_detection(df, 'Values')
print(df)

Scaling

Kodu kopyala
import pandas as pd
from rdatapp.scaler import Scaler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = Scaler.min_max_scale(df, 'Values')
print(df)

Feature Engineering

import pandas as pd
from rdatapp.feature_engineer import FeatureEngineer

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)
print(df)

Date-Time Handling

import pandas as pd
from rdatapp.date_time_handler import DateTimeHandler

df = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})
df = DateTimeHandler.to_datetime(df, 'Date')
df = DateTimeHandler.extract_date_parts(df, 'Date')
print(df)

Authors

License

This project is not licensed. Feel free to use.

Contributing

We welcome contributions! Please contact us via E-mail addresses.

Acknowledgments

Special thanks to the instructors who provided guidance and support throughout the development of this project.

Project Links


For any issues, please contact the authors or open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdatapp-0.7a0.tar.gz (7.7 kB view hashes)

Uploaded Source

Built Distribution

rdatapp-0.7a0-py3-none-any.whl (8.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page