Skip to main content

Modular and extensible data preprocessing library

Project description

🪿🪿GeeseTools

  • Fast & Flexible Data Analysis Toolkit

Welcome to GeeseTools – a lightweight and modular toolkit designed for quick data preprocessing, model building, evaluation, and visualizations. Perfect for quick experiments and rapid prototyping in machine learning workflows!


Features

  • Clean and preprocess your datasets effortlessly with datapreprocessor
  • Quickly train and evaluate models with utils
  • Auto-generate plots for insights and performance metrics
  • Minimal setup, beginner-friendly, and fully extensible

Module Structure

📦 GeeseTools/
├── 📂 data/                            
│   └──📄 heart.csv             # Default Dataset            
│    
├── 📁 DataPreProcessor       
│   └──📜 DataPreProcessor.py   # Main Script
│
└── 📁 utils                    
    ├──📜 train_models.py       # Model training,
    ├──📜 evaluation.py         # Model evaluation
    └──📜 plot.py               # Evaluation visualization


Installation

pip install GeeseTools

📚 How to Use

1. Import the modules

from datapreprocessor import datapreprocessor as dpp
from utils import train_model as tm
from utils import evaluate_model as eval
from utils import plot

2. Preprocess your data

# Creating object for DataPreProcessor Class
obj = dpp(pd.read_csv("heart.csv"), target="diagnosis")

3. Train a model

model, task_type, history = tm.train_model(X_train, y_train)

4. Evaluate the model

metric, y_pred = eval.evaluate_model(model, X_test, y_test, task_type)

5. Plot results

plot.plot_model_outputs(y_test, y_pred) # for Classification problem
or
plot.plot_model_outputs(history) # for Regression problem

Example Notebook

Check out DataAnalysis.ipynb for a full example pipeline from preprocessing to visualization.


Dependencies

  • scipy pandas ipython
  • seaborn openpyxl
  • matplotlib scikit-learn
  • imbalanced-learn

Contributing

Feel free to fork and improve! PRs are welcome for new features, improvements, or bug fixes.


Contact

Made with ❤️ by Abhijeet
LinkedIn | GitHub


License

MIT © Abhijeet You're free to use, modify, and distribute this project with proper attribution.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geesetools-0.2.1.tar.gz (28.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geesetools-0.2.1-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file geesetools-0.2.1.tar.gz.

File metadata

  • Download URL: geesetools-0.2.1.tar.gz
  • Upload date:
  • Size: 28.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for geesetools-0.2.1.tar.gz
Algorithm Hash digest
SHA256 bdeee41155aa1e5fc9b7e1459425b401d3919f6d5d39275da7d3124ec5fc1507
MD5 b20abf843469c12068790cdf055444ea
BLAKE2b-256 3de392784a225a37ad600b8c9357bf3a89d2c6eb43439fa38929d52816e3935e

See more details on using hashes here.

File details

Details for the file geesetools-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: geesetools-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.11

File hashes

Hashes for geesetools-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b50ce8a761394b63ccdc4235f595c67143ce058b20d739ddfde181d7397a7f85
MD5 ed780f026f8817d4f166a913c4cc0974
BLAKE2b-256 6050977e6b4dec11e3b486a3db92f73ca3e48f61feee9f356359e92bd4c2dea1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page