A low-code library for machine learning pipelines
Project description
blitzml
Automate machine learning pipelines rapidly
How to install
pip install blitzml
Usage
from blitzml.tabular import Classification
import pandas as pd
# prepare your dataframes
train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv")
test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv")
ground_truth_df = pd.read_csv("auxiliary/datasets/banknote/ground_truth.csv")
# create the pipeline
auto = Classification(train_df, test_df, ground_truth_df, classifier = 'RF', n_estimators = 50)
# first perform data preprocessing
auto.preprocess()
# second train the model
auto.train_the_model()
# After training the model we can generate:
auto.gen_pred_df()
auto.gen_metrics_dict()
# We can get their values using:
pred_df = auto.pred_df
metrics_dict = auto.metrics_dict
print(pred_df.head())
print(metrics_dict)
Available Classifiers
- Random Forest 'RF'
- LinearDiscriminantAnalysis 'LDA'
- Support Vector Classifier 'SVC'
- KNeighborsClassifier 'KNN'
- GaussianNB 'GNB'
- LogisticRegression 'LR'
- AdaBoostClassifier 'AB'
- GradientBoostingClassifier 'GB'
- DecisionTreeClassifier 'DT'
- MLPClassifier 'MLP'
The possible arguments for each model can be found in the sklearn docs
Working with a custom classifier
# instead of specifying a classifier name, we pass "custom" to the classifier argument.
auto = Classification(
train_df,
test_df,
ground_truth_df,
classifier = "custom",
class_name = "classifier",
file_path = "auxiliary/scripts/dummy.py",
feature_selection = "importance"
)
Smart Feature Selection
# to filter used columns by correlation with target column
auto = Classification(
'''
params
'''
feature_selection = "correlation"
)
# to filter used columns by feature importance (boruta)
auto = Classification(
'''
params
'''
feature_selection = "importance"
)
Additional features
• Preprocessing a dataset
# After executing
auto.preprocess()
# You can access the processed datasets via
processed_train_df = auto.train_df
processed_test_df = auto.test_df
• Less coding
# Instead of
auto.preprocess()
auto.train_the_model()
auto.gen_pred_df()
auto.gen_metrics_dict()
# You can simply use
auto.run()
Development
- Clone the repo
- run
pip install virtualenv
- run
python -m virtualenv venv
- run
. ./venv/bin/activate
on UNIX based systems or. ./venv/Scripts/activate.ps1
if on windows - run
pip install -r requirements.txt
- run
pre-commit install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file blitzml-0.9.0-py3-none-any.whl
.
File metadata
- Download URL: blitzml-0.9.0-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d796eaff5f3b47b05b0bf0567d0f934ad3b11bac3d0332bcdd451a0b60bd376b |
|
MD5 | bbe5ccfe01c7d8a05f2834cd59d5333a |
|
BLAKE2b-256 | 6cb5770386536f95214a18c751243d94d70bb21df9320eae6c71d4b8bd90a0ca |