A low-code library for machine learning pipelines
Project description
blitzml
Automate machine learning pipelines rapidly
How to install
pip install blitzml
Usage
from blitzml.tabular import Classification
import pandas as pd
# prepare your dataframes
train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv")
test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv")
# create the pipeline
auto = Classification(train_df, test_df, classifier = 'RF', n_estimators = 50)
# first perform data preprocessing
auto.preprocess()
# second train the model
auto.train_the_model()
# After training the model we can generate:
auto.gen_pred_df(auto.test_df)
auto.gen_metrics_dict()
# We can get their values using:
pred_df = auto.pred_df
metrics_dict = auto.metrics_dict
print(pred_df.head())
print(metrics_dict)
Available Classifiers
- Random Forest 'RF'
- LinearDiscriminantAnalysis 'LDA'
- Support Vector Classifier 'SVC'
- KNeighborsClassifier 'KNN'
- GaussianNB 'GNB'
- LogisticRegression 'LR'
- AdaBoostClassifier 'AB'
- GradientBoostingClassifier 'GB'
- DecisionTreeClassifier 'DT'
- MLPClassifier 'MLP'
The possible arguments for each model can be found in the sklearn docs
Working with a custom classifier
# instead of specifying a classifier name, we pass "custom" to the classifier argument.
auto = Classification(
train_df,
test_df,
classifier = "custom",
class_name = "classifier",
file_path = "auxiliary/scripts/dummy.py"
)
Smart Feature Selection
# to filter used columns by correlation with target column
auto = Classification(
'''
params
'''
feature_selection = "correlation" # or "importance" or "none"
)
- Options:
- "correlation": use feature columns with the highest correlation with the target
- "importance": use feature columns that are important for the model to predict the target
- "none": use all feature columns
Additional features
• Preprocessing a dataset
# After executing
auto.preprocess()
# You can access the processed datasets via
processed_train_df = auto.train_df
processed_test_df = auto.test_df
• Validation split
auto = Classification(
'''
params
'''
validation_percentage = 0.1 #default
)
• Multiclass metrics averaging type
auto = Classification(
'''
params
'''
average_type = 'macro' #default
)
• Less coding
# Instead of
auto.preprocess()
auto.train_the_model()
auto.gen_pred_df()
auto.gen_metrics_dict()
# You can simply use
auto.run()
Development
- Clone the repo
- run
pip install virtualenv
- run
python -m virtualenv venv
- run
. ./venv/bin/activate
on UNIX based systems or. ./venv/Scripts/activate.ps1
if on windows - run
pip install -r requirements.txt
- run
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
blitzml-0.11.0.tar.gz
(6.4 kB
view details)
Built Distribution
File details
Details for the file blitzml-0.11.0.tar.gz
.
File metadata
- Download URL: blitzml-0.11.0.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c6f70f755253c75492da4060e5c7b78254ac7369e59469a97d54c85265c27ed |
|
MD5 | 59315612e8743b58106160e4a2f423f9 |
|
BLAKE2b-256 | 3a0e000e167444663e70a7e016e4c17124a0ea3b08fa61d2fa2b04d15e1a1a43 |
File details
Details for the file blitzml-0.11.0-py3-none-any.whl
.
File metadata
- Download URL: blitzml-0.11.0-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a66bc594ccadba0cd832109f76c787611ebc964a3f8ddb5529370b40133d45db |
|
MD5 | 58640da82870885800db421a7bd5a963 |
|
BLAKE2b-256 | ad40e957689deaeca25e08f2fab2b2bac9150e83efeabc2d68b687cc9fee39e1 |