Data cleaning and automated ML model selection package
Project description
batabyal
batabyal is a lightweight Python package that provides:
- cleaning_module for CSV data cleaning utilities
- trainer_kit for automatic best machine-learning model selection and training works on roc_auc score It is designed for rapid experimentation, prototyping, and small-to-medium ML workflows where you want sensible defaults without repetitive boilerplate.
Installation
pip install batabyal
Importation
from batabyal import trainer_kit as tk
from batabyal import cleaning_module as cm
Usage
tk.train(x, y, "numeric", "multiclass", 3)
#structure: train(x, y, x_type:XType, y_type:YType, n_splits:int, random_state:int|None=42)
#XType = Literal["numeric", "one_hot", "mixed"]
#YType = Literal["binary", "multiclass"]
cm.clean_csv('filename.csv', numericData, charData, True)
#structure: clean_csv(file, numericData, charData, Fill, dummies=None)
#If `Fill==True`, it fills NaN in numeric columns with its mean.
#`dummies` are the list of values to replace with NaN before cleaning.
'trainer_kit' details
it uses:
- StratifiedKFold and GridSearchCV to find the best estimator
roc_auc_ovr_weightedfor scoring it is limited to:- LogisticRegression,
- DecisionTreeClassifier,
- RandomForestClassifier,
- GaussianNB,
- BernoulliNB use it when:
- you have binary or multi-classed datasets with target labels (i.e. only for ClassifierMixin) don't use when:
- your dataset is single-classed it assumes:
- your dataset is perfectly cleaned
- one hot encoded (if applicable)
- data is scaled (if applicable)
- column order is same for train and test data it returns:
- the best trained model and its roc_auc score with the best hyperparameter tunning
'cleaning_module' details
only for .csv file cleaning
it returns the cleaned dataframe
This package will help you to train supervised learning models quicker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
batabyal-0.1.0.tar.gz
(5.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file batabyal-0.1.0.tar.gz.
File metadata
- Download URL: batabyal-0.1.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b69cdc9cacf224e3536d226c0fbd32b3f46b73ff203b45d84f793465e1c4bee
|
|
| MD5 |
83a6926dae5ff7bb90e9fcfb6d1e1eea
|
|
| BLAKE2b-256 |
b1280912d7684c53ea02a0aaf7509e9bbdd2cecc4821804df0a66fa00f123e0d
|
File details
Details for the file batabyal-0.1.0-py3-none-any.whl.
File metadata
- Download URL: batabyal-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
140acba179c71f66c0f530ed8dd910ef5a993b08a22a4b9cb63fa329443ae7a7
|
|
| MD5 |
18ba9a06bb74bb21055710dfd521143f
|
|
| BLAKE2b-256 |
15b80af407bdb99225151071ed6f289ac9619540f35d96daaa48ef4e0dfc89fa
|