A low-code library for machine learning pipelines

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

blitzml

Automate machine learning pipelines rapidly

How to install

pip install blitzml

Usage

from blitzml.tabular import Classification
import pandas as pd

# prepare your dataframes
train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv")
test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv")

# create the pipeline
auto = Classification(train_df, test_df, classifier = 'RF', n_estimators = 50)

# first perform data preprocessing
auto.preprocess()
# second train the model
auto.train_the_model()


# After training the model we can generate:
auto.gen_pred_df(auto.test_df)
auto.gen_metrics_dict()

# We can get their values using:
pred_df = auto.pred_df
metrics_dict = auto.metrics_dict

print(pred_df.head())
print(metrics_dict)

Available Classifiers

Random Forest 'RF'
LinearDiscriminantAnalysis 'LDA'
Support Vector Classifier 'SVC'
KNeighborsClassifier 'KNN'
GaussianNB 'GNB'
LogisticRegression 'LR'
AdaBoostClassifier 'AB'
GradientBoostingClassifier 'GB'
DecisionTreeClassifier 'DT'
MLPClassifier 'MLP'

The possible arguments for each model can be found in the sklearn docs

Working with a custom classifier

# instead of specifying a classifier name, we pass "custom" to the classifier argument.
auto = Classification(
    train_df,
    test_df,
    classifier = "custom", 
    class_name = "classifier",
    file_path = "auxiliary/scripts/dummy.py"
)

Smart Feature Selection

# to filter used columns by correlation with target column
auto = Classification(
    '''
    params
    '''
    feature_selection = "correlation" # or "importance" or "none"
)

Options:
- "correlation": use feature columns with the highest correlation with the target
- "importance": use feature columns that are important for the model to predict the target
- "none": use all feature columns

Additional features

• Preprocessing a dataset

# After executing
auto.preprocess()
# You can access the processed datasets via
processed_train_df = auto.train_df
processed_test_df = auto.test_df

• Validation split

auto = Classification(
    '''
    params
    '''
    validation_percentage = 0.1 #default
)

• Multiclass metrics averaging type

auto = Classification(
    '''
    params
    '''
    average_type = 'macro' #default
)

• Less coding

# Instead of
auto.preprocess()
auto.train_the_model()
auto.gen_pred_df()
auto.gen_metrics_dict()
# You can simply use
auto.run()

Development

Clone the repo
run pip install virtualenv
run python -m virtualenv venv
run . ./venv/bin/activate on UNIX based systems or . ./venv/Scripts/activate.ps1 if on windows
run pip install -r requirements.txt
run pre-commit install

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.20.0

Aug 25, 2023

0.18.0

Jul 6, 2023

0.17.0

Jun 30, 2023

0.16.0

Jun 30, 2023

0.15.0

Jun 26, 2023

0.14.0

Apr 11, 2023

0.13.0

Mar 3, 2023

This version

0.12.0

Feb 25, 2023

0.11.0

Feb 12, 2023

0.10.0

Jan 27, 2023

0.9.0

Jan 27, 2023

0.8.0

Jan 27, 2023

0.7.0

Oct 31, 2022

0.6.0

Oct 29, 2022

0.5.0

Oct 29, 2022

0.4.0

Oct 29, 2022

0.3.0

Oct 28, 2022

0.2.0

Oct 28, 2022

0.1.0

Oct 28, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

blitzml-0.12.0-py3-none-any.whl (6.2 kB view hashes)

Uploaded Feb 25, 2023 Python 3

Hashes for blitzml-0.12.0-py3-none-any.whl

Hashes for blitzml-0.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`336ff2b5390685d026c46fd6d62c3e19a906076fe8b0ba34f1b2dcdfce19ef28`
MD5	`7f5bb8f21b41b71c85eea9946ecc07e7`
BLAKE2b-256	`394b48d7ab60c0cdf7b9a6111ec26ef7a89f9cd794ca3bf72cf055311905ef95`