Skip to main content

A recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.

Project description

RDATAPP: Recoded Data Preprocessing Library

PyPI version Python versions License

Overview

RDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.

Features

  • Text Cleaning: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.
  • Missing Value Handling: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.
  • Encoding: One-hot encode and label encode categorical columns.
  • Outlier Detection: Detect and remove outliers using the Interquartile Range (IQR) method.
  • Scaling: Apply Min-Max scaling and standard scaling to numerical columns.
  • Feature Engineering: Create new features by applying functions to existing columns.
  • Date-Time Handling: Convert columns to datetime format and extract date parts like year, month, and day.

Installation

You can install RDATAPP from PyPI using pip:

pip install rdatapp

Usage

Below are examples of how to use the different classes and methods provided by RDATAPP.

Text Cleaning

from rdatapp.text_cleaning import TextCleaner

text_cleaner = TextCleaner()
cleaned_text = text_cleaner.clean_text("This is a Sample TEXT, with Punctuation!")
print(cleaned_text)

Missing Value Handling

import pandas as pd
from rdatapp.missing_value_handler import MissingValueHandler

df = pd.DataFrame({'A': [1, 2, None, 4]})
df = MissingValueHandler.impute_mean(df, 'A')
print(df)

Encoding

import pandas as pd
from rdatapp.categorical_encoder import CategoricalEncoder

df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
df = CategoricalEncoder.one_hot_encode(df, 'Category')
print(df)

Outlier Detection

import pandas as pd
from rdatapp.outlier_handler import OutlierHandler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})
df = OutlierHandler.iqr_outlier_detection(df, 'Values')
print(df)

Scaling

import pandas as pd
from rdatapp.scaler import Scaler

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = Scaler.min_max_scale(df, 'Values')
print(df)

Feature Engineering

import pandas as pd
from rdatapp.feature_engineer import FeatureEngineer

df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)
print(df)

Date-Time Handling

import pandas as pd
from rdatapp.date_time_handler import DateTimeHandler

df = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})
df = DateTimeHandler.to_datetime(df, 'Date')
df = DateTimeHandler.extract_date_parts(df, 'Date')
print(df)

Authors

License

This project is not licensed. Feel free to use.

Contributing

We welcome contributions! Please contact us via E-mail addresses.

Acknowledgments

Special thanks to the instructors who provided guidance and support throughout the development of this project.

Project Links


For any issues, please contact the authors or open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdatapp-1.0.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

rdatapp-1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file rdatapp-1.0.tar.gz.

File metadata

  • Download URL: rdatapp-1.0.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for rdatapp-1.0.tar.gz
Algorithm Hash digest
SHA256 34f86ea0ca0ce8330160afbf44890a6b6569cb0efe90b87f09f6449f3f12b850
MD5 eb748214b88aba27b6c7a5b38ad0b74c
BLAKE2b-256 31c5a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce

See more details on using hashes here.

File details

Details for the file rdatapp-1.0-py3-none-any.whl.

File metadata

  • Download URL: rdatapp-1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for rdatapp-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f183b8ac9d842e7342ccad9d71e65a48d17ded981400198a5c0c20cb92784bd
MD5 d17c77f4b9633cdc45ed2df8572960b8
BLAKE2b-256 6695f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page