Skip to main content

Automated Machine Learning for Supervised tasks

Project description

mljar-supervised

Build Status PyPI version Coverage Status PyPI pyversions

Machine Learning for Humans

The new standard in Machine Learning!

Thanks to Automated Machine Learning you don't need to worry about different machine learning interfaces. You don't need to know all algorithms and their hyper-parameters. With AutoML model tuning and training is painless.

In the current version only binary classification is supported with optimization of LogLoss metric.

Quick example

import pandas as pd
from supervised.automl import AutoML

df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True)

X = df[df.columns[:-1]]
y = df["income"]

automl = AutoML()
automl.fit(X, y)

predictions = automl.predict(X)

The tuning algorithm

The tuning algorithm was created and developed by Piotr Płoński. It is heuristic algorithm created from combination of:

  • not-so-random approach
  • and hill-climbing

The approach is not-so-random because each algorithm has a defined set of hyper-parameters that usually works. At first step from not so random parameters an initial set of models is drawn. Then the hill climbing approach is used to pick best performing algorithms and tune them.

For each algorithm used in the AutoML the early stopping is applied.

The ensemble algorithm was implemented based on Caruana paper.

Installation

From PyPi repository:

pip install mljar-supervised

From source code:

git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install

Python 3.6 is required.

Usage

This is Automated Machine Learning package, so all hard tasks is done for you. The interface is simple but if necessary it gives you ability to control the training process.

Train and predict

automl = AutoML()
automl.fit(X, y)
predictions = automl.predict(X)

By the default, the training should finish in less than 1 hour and as ML algorithms will be checked:

  • Random Forest
  • Xgboost
  • CatBoost
  • LightGBM
  • Neural Network
  • Ensemble

The parameters that you can use to control the training process are:

  • total_time_limit - it is a total time limit that AutoML can spend for searching to the best ML model. It is in seconds. Default is set to 3600 seconds.
  • learner_time_limit - the time limit for training single model, in case of k-fold cross validation, the time spend on training is k*learner_time_limit. This parameter is only considered when total_time_limit is set to None. Default is set to 120 seconds.
  • algorithms - the list of algorithms that will be checked. Default is set to ["CatBoost", "Xgboost", "RF", "LightGBM", "NN"].
  • start_random_models - the number of models to check with not so random algorithm. Default is set to 10.
  • hill_climbing_steps - number of hill climbing steps used in models tuning. Default is set to 3.
  • top_models_to_improve - number of models considered for improvement in each hill climbing step. Default is set to 5.
  • train_ensemble - decides if ensemble model is trained at the end of AutoML fit procedure. Default is set to True.
  • verbose - controls printouts, Default is set to True.

Development

Installation

git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt

Testing

cd supervised
python -m tests.run_all

Newsletter

Don't miss updates and news from us. Subscribe to newsletter!

Roadmap

The package is under active development! Please expect a lot of changes! For this package the graphical interface will be provided soon (also open source!). Please be tuned.

Project details


Release history Release notifications | RSS feed

This version

0.1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mljar-supervised-0.1.1.tar.gz (23.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page