Skip to main content

Automated Machine Learning for Supervised tasks

Project description

mljar-supervised

Build Status PyPI version Coverage Status PyPI pyversions

Machine Learning for Humans

The new standard in Machine Learning!

Thanks to Automated Machine Learning you don't need to worry about different machine learning interfaces. You don't need to know all algorithms and their hyper-parameters. With AutoML model tuning and training is painless.

In the current version only binary classification is supported with optimization of LogLoss metric.

Quick example

import pandas as pd
from supervised.automl import AutoML

df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True)

X = df[df.columns[:-1]]
y = df["income"]

automl = AutoML()
automl.fit(X, y)

predictions = automl.predict(X)

The tuning algorithm

The tuning algorithm was created and developed by Piotr Płoński. It is heuristic algorithm created from combination of:

  • not-so-random approach
  • and hill-climbing

The approach is not-so-random because each algorithm has a defined set of hyper-parameters that usually works. At first step from not so random parameters an initial set of models is drawn. Then the hill climbing approach is used to pick best performing algorithms and tune them.

For each algorithm used in the AutoML the early stopping is applied.

The ensemble algorithm was implemented based on Caruana paper.

Installation

From PyPi repository:

pip install mljar-supervised

From source code:

git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install

Python 3.6 is required.

Usage

This is Automated Machine Learning package, so all hard tasks is done for you. The interface is simple but if necessary it gives you ability to control the training process.

Train and predict

automl = AutoML()
automl.fit(X, y)
predictions = automl.predict(X)

By the default, the training should finish in less than 1 hour and as ML algorithms will be checked:

  • Random Forest
  • Xgboost
  • CatBoost
  • LightGBM
  • Neural Network
  • Ensemble

The parameters that you can use to control the training process are:

  • total_time_limit - it is a total time limit that AutoML can spend for searching to the best ML model. It is in seconds. Default is set to 3600 seconds.
  • learner_time_limit - the time limit for training single model, in case of k-fold cross validation, the time spend on training is k*learner_time_limit. This parameter is only considered when total_time_limit is set to None. Default is set to 120 seconds.
  • algorithms - the list of algorithms that will be checked. Default is set to ["CatBoost", "Xgboost", "RF", "LightGBM", "NN"].
  • start_random_models - the number of models to check with not so random algorithm. Default is set to 10.
  • hill_climbing_steps - number of hill climbing steps used in models tuning. Default is set to 3.
  • top_models_to_improve - number of models considered for improvement in each hill climbing step. Default is set to 5.
  • train_ensemble - decides if ensemble model is trained at the end of AutoML fit procedure. Default is set to True.
  • verbose - controls printouts, Default is set to True.

Development

Installation

git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt

Testing

cd supervised
python -m tests.run_all

Newsletter

Don't miss updates and news from us. Subscribe to newsletter!

Roadmap

The package is under active development! Please expect a lot of changes! For this package the graphical interface will be provided soon (also open source!). Please be tuned.

To be added:

  • training single decision tree
  • create text report from trained models (maybe with plots from learning)
  • compute threshold for model prediction and predicting discrete output (label)
  • add model/predictions explanations
  • add support for multiclass classification
  • add support for regressions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for mljar-supervised, version 0.1.7
Filename, size & hash File type Python version Upload date
mljar-supervised-0.1.7.tar.gz (25.7 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page