Modular and extensible data preprocessing library
Project description
🪿🪿 GeeseTools 🛠️🛠️
Modular and Extensible Data Preprocessing Library for Machine Learning
Goose is a plug-and-play, mixin-based Python library that streamlines the preprocessing of tabular datasets for machine learning tasks. Whether you’re cleaning messy data, encoding categories, transforming skewed distributions, or scaling features — this package has you covered.
🚀 Features
- 🧼 Handle missing data
- 🔢 Convert object columns to numeric
- 🔍 Identify feature types (categorical, ordinal, nominal, etc.)
- ⚙️ Encode nominal and ordinal features
- 🔄 Transform skewed and heavy-tailed features
- 📏 Scale features with standard or power transformations
- 🧪 Train-test split with optional oversampling
- 📊 Transformation logs for transparency and reproducibility
- 🔌 Built using Mixins for modular extension
⚙️ Installation
You can install the package directly from PyPI:
pip install GeeseTools
Or, after building your wheel file (.whl) from the source:
pip install dist/GeeseTools-0.1.8-py3-none-any.whl
Or install directly in editable mode (for development):
pip install -e .
🧪 Usage
import GeeseTools as gt
# Instantiate with a dataset
obj = gt(
dataframe=df,
target_variable='target',
ordinal_features=['education_level'],
ordinal_categories=[['Low', 'Medium', 'High']],
use_one_hot_encoding=True
)
# Apply full preprocessing pipeline
X_train, X_test, y_train, y_test = obj.pre_process()
# Access logs
print(obj.transformation_log_df)
🗂 Default Sample Dataset
If no DataFrame is provided, the processor loads a built-in heart.csv dataset:
obj = Goose() # Uses sample heart dataset
# Apply full preprocessing pipeline
X_train, X_test, y_train, y_test = obj.pre_process()
📁 Project Structure
src/
│
├── Goose/
│ ├── Goose.py # Main class
│ ├── mixins/ # Modular preprocessing logic
│ ├── data/heart.csv # Default dataset
│ ├── datasets.py # Heart dataset loader
│ └── __init__.py
⚙️ Requirements
- Python 3.9–3.11
- pandas
- scikit-learn
- imbalanced-learn
- scipy
- ipython
- openpyxl
📜 License
MIT © Abhijeet
You're free to use, modify, and distribute this project with proper attribution.
✨ Contributions Welcome
Want to add new mixins or support more file types? Fork it, branch it, push it, and let’s build together!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geesetools-0.1.8.tar.gz.
File metadata
- Download URL: geesetools-0.1.8.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a8b07f67154f5845f90c4ddd28f610a6dd4cc757e3585820d4ed193b0dc0e73
|
|
| MD5 |
247136b4ed3144ec062da1b6a8407727
|
|
| BLAKE2b-256 |
79664998940e5c68b33ccc99a0a327ea6c0562927422c4c2570923764cf43b0a
|
File details
Details for the file geesetools-0.1.8-py3-none-any.whl.
File metadata
- Download URL: geesetools-0.1.8-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
916ef81fcedac878b4b2820d1625536e782cff62f9deb245756f8307d2980af2
|
|
| MD5 |
4fcb18ca45a67dbfc1f847c8ff3cbdd5
|
|
| BLAKE2b-256 |
3d63b30066915a54b7cdede2fbd5563a43809d2e681049960b839722887ede99
|