Modular and extensible data preprocessing library
Project description
๐ชฟ๐ชฟ GeeseTools ๐ ๏ธ๐ ๏ธ
Modular and Extensible Data Preprocessing Library for Machine Learning
GeeseTools is a plug-and-play, mixin-based Python library that streamlines the preprocessing of tabular datasets for machine learning tasks. Whether youโre cleaning messy data, encoding categories, transforming skewed distributions, or scaling features โ this package has you covered.
๐ Features
- ๐งผ Handle missing data
- ๐ข Convert object columns to numeric
- ๐ Identify feature types (categorical, ordinal, nominal, etc.)
- โ๏ธ Encode nominal and ordinal features
- ๐ Transform skewed and heavy-tailed features
- ๐ Scale features with standard or power transformations
- ๐งช Train-test split with optional oversampling
- ๐ Transformation logs for transparency and reproducibility
- ๐ Built using Mixins for modular extension
โ๏ธ Installation
You can install the package directly from PyPI:
pip install GeeseTools
Or, after building your wheel file (.whl) from the source:
pip install dist/GeeseTools-0.1.8-py3-none-any.whl
Or install directly in editable mode (for development):
pip install -e .
๐งช Usage
import GeeseTools as gt
# Instantiate with a dataset
obj = gt(
dataframe=df,
target_variable='target',
ordinal_features=['education_level'],
ordinal_categories=[['Low', 'Medium', 'High']],
use_one_hot_encoding=True
)
# Apply full preprocessing pipeline
X_train, X_test, y_train, y_test = obj.pre_process()
# Access logs
print(obj.transformation_log_df)
๐ Default Sample Dataset
If no DataFrame is provided, the processor loads a built-in heart.csv dataset:
obj = GeeseTools() # Uses sample heart dataset
# Apply full preprocessing pipeline
X_train, X_test, y_train, y_test = obj.pre_process()
๐ Project Structure
๐ฆ GeeseTools/
โโโ ๐ data/ # ๐ Contains bundled datasets
โ โโโ ๐ heart.csv # ๐ Sample dataset (CSV format)
โ โโโ ๐ __init__.py # ๐ฆ Makes 'data' a subpackage
โ
โโโ ๐ GeeseTools.py # ๐ง Core toolkit initializer or controller
โโโ ๐ datasets.py # ๐ Dataset loading utilities
โโโ ๐งฉ display_mixin.py # ๐ฅ๏ธ Display-related mixin
โโโ ๐งฉ drop_features_mixin.py # โ๏ธ Drop unwanted features
โโโ ๐งฉ drop_records_mixin.py # ๐๏ธ Drop records based on rules
โโโ ๐งฉ encode_mixin.py # ๐ค Encoding (label, one-hot)
โโโ ๐งฉ feature_target_split_mixin.py # ๐ Split into features & target
โโโ ๐งฉ feature_type_mixin.py # ๐งฌ Feature type detection
โโโ ๐งฉ impute_features_mixin.py # ๐ฉน Fill missing values
โโโ ๐งฉ missing_data_summary_mixin.py # ๐ Summary of missing data
โโโ ๐งฉ oversample_mixin.py # ๐งช Oversampling (e.g., SMOTE)
โโโ ๐งฉ pre_process_mixin.py # โ๏ธ Complete preprocessing pipeline
โโโ ๐งฉ sample_data_mixin.py # ๐ฒ Random sampling utilities
โโโ ๐งฉ scale_mixin.py # ๐ Scaling methods
โโโ ๐งฉ split_dataframe_mixin.py # ๐งฏ Split dataframe columns
โโโ ๐งฉ to_numeric_mixin.py # ๐ข Convert to numeric
โโโ ๐งฉ transform_mixin.py # ๐ง Feature transformations
โโโ ๐งฉ unique_value_summary_mixin.py # ๐งพ Unique value summary
โโโ ๐ __init__.py # ๐ฆ Initializes GeeseTools package
โ๏ธ Requirements
- Python 3.9โ3.11
- pandas
- scikit-learn
- imbalanced-learn
- scipy
- ipython
- openpyxl
๐ License
MIT ยฉ Abhijeet
You're free to use, modify, and distribute this project with proper attribution.
โจ Contributions Welcome
Want to add new mixins or support more file types? Fork it, branch it, push it, and letโs build together!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geesetools-0.1.11.tar.gz.
File metadata
- Download URL: geesetools-0.1.11.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acd178cb4ad6b09d9a8f3ed6f95fcbcd215f4f5ca40aeecce89f8851c45ed07d
|
|
| MD5 |
05b41219b129a9e3ccf61c9728e10daf
|
|
| BLAKE2b-256 |
a0227ad86904a8cab378a6131cef739fc4c3445dafbce3d5a72d2512ee4ebb27
|
File details
Details for the file geesetools-0.1.11-py3-none-any.whl.
File metadata
- Download URL: geesetools-0.1.11-py3-none-any.whl
- Upload date:
- Size: 33.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05e486efe9ab3b83fe33cc13b9a58b143eda073f955a6790cd9a16f613c111f8
|
|
| MD5 |
6009854604648edc367305e895064a2a
|
|
| BLAKE2b-256 |
f7f21fafabb5cebd64c20df842841d2cfb3120ee782890f55a877d70e082f914
|