Skip to main content

A low-code library for machine learning pipelines

Project description

blitzml

Automate machine learning pipelines rapidly

How to install

pip install blitzml

Usage

from blitzml.tabular import Classification
import pandas as pd

# prepare your dataframes
train_df = pd.read_csv("auxiliary/datasets/banknote/train.csv")
test_df = pd.read_csv("auxiliary/datasets/banknote/test.csv")

# create the pipeline
auto = Classification(train_df, test_df, classifier = 'RF', n_estimators = 50)

# first perform data preprocessing
auto.preprocess()
# second train the model
auto.train_the_model()


# After training the model we can generate:
auto.gen_pred_df(auto.test_df)
auto.gen_metrics_dict()

# We can get their values using:
pred_df = auto.pred_df
metrics_dict = auto.metrics_dict

print(pred_df.head())
print(metrics_dict)

Available Classifiers

  • Random Forest 'RF'
  • LinearDiscriminantAnalysis 'LDA'
  • Support Vector Classifier 'SVC'
  • KNeighborsClassifier 'KNN'
  • GaussianNB 'GNB'
  • LogisticRegression 'LR'
  • AdaBoostClassifier 'AB'
  • GradientBoostingClassifier 'GB'
  • DecisionTreeClassifier 'DT'
  • MLPClassifier 'MLP'

The possible arguments for each model can be found in the sklearn docs

Working with a custom classifier

# instead of specifying a classifier name, we pass "custom" to the classifier argument.
auto = Classification(
    train_df,
    test_df,
    classifier = "custom", 
    class_name = "classifier",
    file_path = "auxiliary/scripts/dummy.py"
)

Smart Feature Selection

# to filter used columns by correlation with target column
auto = Classification(
    '''
    params
    '''
    feature_selection = "correlation" # or "importance" or "none"
)
  • Options:
    • "correlation": use feature columns with the highest correlation with the target
    • "importance": use feature columns that are important for the model to predict the target
    • "none": use all feature columns

Additional features

• Preprocessing a dataset

# After executing
auto.preprocess()
# You can access the processed datasets via
processed_train_df = auto.train_df
processed_test_df = auto.test_df

• Validation split

auto = Classification(
    '''
    params
    '''
    validation_percentage = 0.1 #default
)

• Multiclass metrics averaging type

auto = Classification(
    '''
    params
    '''
    average_type = 'macro' #default
)

• Less coding

# Instead of
auto.preprocess()
auto.train_the_model()
auto.gen_pred_df()
auto.gen_metrics_dict()
# You can simply use
auto.run()

Development

  • Clone the repo
  • run pip install virtualenv
  • run python -m virtualenv venv
  • run . ./venv/bin/activate on UNIX based systems or . ./venv/Scripts/activate.ps1 if on windows
  • run pip install -r requirements.txt
  • run pre-commit install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blitzml-0.11.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

blitzml-0.11.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file blitzml-0.11.0.tar.gz.

File metadata

  • Download URL: blitzml-0.11.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for blitzml-0.11.0.tar.gz
Algorithm Hash digest
SHA256 5c6f70f755253c75492da4060e5c7b78254ac7369e59469a97d54c85265c27ed
MD5 59315612e8743b58106160e4a2f423f9
BLAKE2b-256 3a0e000e167444663e70a7e016e4c17124a0ea3b08fa61d2fa2b04d15e1a1a43

See more details on using hashes here.

File details

Details for the file blitzml-0.11.0-py3-none-any.whl.

File metadata

  • Download URL: blitzml-0.11.0-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.4

File hashes

Hashes for blitzml-0.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a66bc594ccadba0cd832109f76c787611ebc964a3f8ddb5529370b40133d45db
MD5 58640da82870885800db421a7bd5a963
BLAKE2b-256 ad40e957689deaeca25e08f2fab2b2bac9150e83efeabc2d68b687cc9fee39e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page