A recoded data preprocessing library for handling various data cleaning and transformation tasks. The library includes classes for text cleaning, missing value imputation, one-hot encoding, and more.
Project description
RDATAPP: Recoded Data Preprocessing Library
Overview
RDATAPP is a comprehensive data preprocessing library designed to handle various data cleaning and transformation tasks. This library includes classes and methods for text cleaning, missing value imputation, one-hot encoding, outlier detection, feature engineering, and more.
Features
- Text Cleaning: Convert text to lowercase, remove punctuation and stopwords, and lemmatize words.
- Missing Value Handling: Impute missing values using mean, median, or a constant value. Alternatively, delete rows with missing values.
- Encoding: One-hot encode and label encode categorical columns.
- Outlier Detection: Detect and remove outliers using the Interquartile Range (IQR) method.
- Scaling: Apply Min-Max scaling and standard scaling to numerical columns.
- Feature Engineering: Create new features by applying functions to existing columns.
- Date-Time Handling: Convert columns to datetime format and extract date parts like year, month, and day.
Installation
You can install RDATAPP from PyPI using pip:
pip install rdatapp
Usage
Below are examples of how to use the different classes and methods provided by RDATAPP.
Text Cleaning
from rdatapp.text_cleaning import TextCleaner
text_cleaner = TextCleaner()
cleaned_text = text_cleaner.clean_text("This is a Sample TEXT, with Punctuation!")
print(cleaned_text)
Missing Value Handling
import pandas as pd
from rdatapp.missing_value_handler import MissingValueHandler
df = pd.DataFrame({'A': [1, 2, None, 4]})
df = MissingValueHandler.impute_mean(df, 'A')
print(df)
Encoding
import pandas as pd
from rdatapp.categorical_encoder import CategoricalEncoder
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C']})
df = CategoricalEncoder.one_hot_encode(df, 'Category')
print(df)
Outlier Detection
import pandas as pd
from rdatapp.outlier_handler import OutlierHandler
df = pd.DataFrame({'Values': [1, 2, 3, 4, 100]})
df = OutlierHandler.iqr_outlier_detection(df, 'Values')
print(df)
Scaling
import pandas as pd
from rdatapp.scaler import Scaler
df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = Scaler.min_max_scale(df, 'Values')
print(df)
Feature Engineering
import pandas as pd
from rdatapp.feature_engineer import FeatureEngineer
df = pd.DataFrame({'Values': [1, 2, 3, 4, 5]})
df = FeatureEngineer.create_new_feature(df, 'Values', lambda x: x**2)
print(df)
Date-Time Handling
import pandas as pd
from rdatapp.date_time_handler import DateTimeHandler
df = pd.DataFrame({'Date': ['2021-01-01', '2021-02-01', '2021-03-01']})
df = DateTimeHandler.to_datetime(df, 'Date')
df = DateTimeHandler.extract_date_parts(df, 'Date')
print(df)
Authors
- Izzettin Furkan Özmen - izzettinfurkan.ozmen@stu.fsm.edu.tr linkedin
- Ismail Cifci - ismail.cifci@stu.fsm.edu.tr linkedin
License
This project is not licensed. Feel free to use.
Contributing
We welcome contributions! Please contact us via E-mail addresses.
Acknowledgments
Special thanks to the instructors who provided guidance and support throughout the development of this project.
Project Links
For any issues, please contact the authors or open an issue on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rdatapp-1.0.tar.gz
.
File metadata
- Download URL: rdatapp-1.0.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34f86ea0ca0ce8330160afbf44890a6b6569cb0efe90b87f09f6449f3f12b850 |
|
MD5 | eb748214b88aba27b6c7a5b38ad0b74c |
|
BLAKE2b-256 | 31c5a463609a00fc960422e147d5d1b48b1bf6eb8d9c0c01969e5376136402ce |
File details
Details for the file rdatapp-1.0-py3-none-any.whl
.
File metadata
- Download URL: rdatapp-1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f183b8ac9d842e7342ccad9d71e65a48d17ded981400198a5c0c20cb92784bd |
|
MD5 | d17c77f4b9633cdc45ed2df8572960b8 |
|
BLAKE2b-256 | 6695f7ab2b243e69dedf41fe1e085e0757b4ba6f81c3afd45cba561a0f6b32da |