A low-code library for machine learning pipelines
Project description
blitzml
Automate machine learning pipelines rapidly
How to install
pip install blitzml
Usage
from blitzml.tabular import Classification
import pandas as pd
# prepare your dataframes
train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv")
test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv")
ground_truth_df = pd.read_csv("auxiliary/datasets/banknote/ground_truth.csv")
# create the pipeline
auto = Classification(train_df, test_df, ground_truth_df, classifier = 'RF', n_estimators = 50)
# first perform data preprocessing
auto.preprocess()
# second train the model
auto.train_the_model()
# After training the model we can generate:
auto.gen_pred_df()
auto.gen_metrics_dict()
# We can get their values using:
pred_df = auto.pred_df
metrics_dict = auto.metrics_dict
print(pred_df.head())
print(metrics_dict)
Available Classifiers
- Random Forest 'RF'
- LinearDiscriminantAnalysis 'LDA'
- Support Vector Classifier 'SVC'
- KNeighborsClassifier 'KNN'
- GaussianNB 'GNB'
- LogisticRegression 'LR'
- AdaBoostClassifier 'AB'
- GradientBoostingClassifier 'GB'
- DecisionTreeClassifier 'DT'
- MLPClassifier 'MLP'
The possible arguments for each model can be found in the sklearn docs
Working with a custom classifier
# instead of specifying a classifier name, we pass "custom" to the classifier argument.
auto = Classification(
train_df,
test_df,
ground_truth_df,
classifier = "custom",
class_name = "classifier",
file_path = "auxiliary/scripts/dummy.py",
feature_selection = "importance"
)
Smart Feature Selection
# to filter used columns by correlation with target column
auto = Classification(
'''
params
'''
feature_selection = "correlation"
)
# to filter used columns by feature importance (boruta)
auto = Classification(
'''
params
'''
feature_selection = "importance"
)
Additional features
• Preprocessing a dataset
# After executing
auto.preprocess()
# You can access the processed datasets via
processed_train_df = auto.train_df
processed_test_df = auto.test_df
• Less coding
# Instead of
auto.preprocess()
auto.train_the_model()
auto.gen_pred_df()
auto.gen_metrics_dict()
# You can simply use
auto.run()
Development
- Clone the repo
- run
pip install virtualenv
- run
python -m virtualenv venv
- run
. ./venv/bin/activate
on UNIX based systems or. ./venv/Scripts/activate.ps1
if on windows - run
pip install -r requirements.txt
- run
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
blitzml-0.8.0.tar.gz
(6.0 kB
view hashes)