Data cleaning and automated ML model selection package
Project description
batabyal
batabyal is a lightweight Python package that provides:
- cleaning_module for CSV data cleaning utilities
- trainer_kit for automatic best machine-learning model selection and training works on roc_auc score It is designed for rapid experimentation, prototyping, and small-to-medium ML workflows where you want sensible defaults without repetitive boilerplate.
Installation
pip install batabyal
Importation
from batabyal import trainer_kit as tk
from batabyal import cleaning_module as cm
Usage
tk.train(x, y, "numeric", "multiclass", 3)
#structure: train(x, y, x_type:XType, y_type:YType, n_splits:int, random_state:int|None=42)
#XType = Literal["numeric", "one_hot", "mixed"]
#YType = Literal["binary", "multiclass"]
cm.clean_csv('filename.csv', numericData, charData, True)
#structure: clean_csv(file, numericData, charData, Fill, dummies=None)
#If `Fill==True`, it fills NaN in numeric columns with its mean.
#`dummies` are the list of values to replace with NaN before cleaning.
'trainer_kit' details
it uses:
- StratifiedKFold and GridSearchCV to find the best estimator
roc_auc_ovr_weightedfor scoring it is limited to:- LogisticRegression,
- DecisionTreeClassifier,
- RandomForestClassifier,
- GaussianNB,
- BernoulliNB use it when:
- you have binary or multi-classed datasets with target labels (i.e. only for ClassifierMixin) don't use when:
- your dataset is single-classed it assumes:
- your dataset is perfectly cleaned
- one hot encoded (if applicable)
- data is scaled (if applicable)
- column order is same for train and test data it returns:
- the best trained model
- its best roc_auc score obtained from hyperparameter tunning
- the best fitted algorithm name which has been used to train
'cleaning_module' details
only for .csv file cleaning
it returns the cleaned dataframe
This package will help you to train supervised learning models quicker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
batabyal-0.2.0.tar.gz
(5.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file batabyal-0.2.0.tar.gz.
File metadata
- Download URL: batabyal-0.2.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c69a27f81da904b86b730d1695b38e7c7c2d0868e45c63474a640de9244c6825
|
|
| MD5 |
286ef11e1878faf18f2aa0e6f7023b08
|
|
| BLAKE2b-256 |
0168505ff86a67d3fe8dde892c83bd0cc9489d7ccaf3d9c1afdbef5e9f1a3243
|
File details
Details for the file batabyal-0.2.0-py3-none-any.whl.
File metadata
- Download URL: batabyal-0.2.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd645a42587ef45dae50823ea54bbd5574bf65abbc21445be2e570fe3273eb81
|
|
| MD5 |
7c108258f4e4f54ac6c80ed01ce040df
|
|
| BLAKE2b-256 |
b9d46b641b98007d4aff4c8b5e3a0b30e603d8b67ad46ae93323f454009c9ec6
|