Automated Machine Learning for Supervised tasks
Project description
mljar-supervised
The new standard in Machine Learning!
Thanks to Automated Machine Learning you don't need to worry about different machine learning interfaces. You don't need to know all algorithms and their hyper-parameters. With AutoML model tuning and training is painless.
In the current version only binary classification is supported with optimization of LogLoss metric.
Quick example
import pandas as pd
from supervised.automl import AutoML
df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True)
X = df[df.columns[:-1]]
y = df["income"]
automl = AutoML()
automl.fit(X, y)
predictions = automl.predict(X)
The tuning algorithm
The tuning algorithm was created and developed by Piotr Płoński. It is heuristic algorithm created from combination of:
- not-so-random approach
- and hill-climbing
The approach is not-so-random because each algorithm has a defined set of hyper-parameters that usually works. At first step from not so random parameters an initial set of models is drawn. Then the hill climbing approach is used to pick best performing algorithms and tune them.
For each algorithm used in the AutoML the early stopping is applied.
The ensemble algorithm was implemented based on Caruana paper.
Installation
From PyPi repository:
pip install mljar-supervised
From source code:
git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install
Python 3.6 is required.
Usage
This is Automated Machine Learning package, so all hard tasks is done for you. The interface is simple but if necessary it gives you ability to control the training process.
Train and predict
automl = AutoML()
automl.fit(X, y)
predictions = automl.predict(X)
By the default, the training should finish in less than 1 hour and as ML algorithms will be checked:
- Random Forest
- Xgboost
- CatBoost
- LightGBM
- Neural Network
- Ensemble
The parameters that you can use to control the training process are:
- total_time_limit - it is a total time limit that AutoML can spend for searching to the best ML model. It is in seconds. Default is set to 3600 seconds.
- learner_time_limit - the time limit for training single model, in case of
k
-fold cross validation, the time spend on training isk*learner_time_limit
. This parameter is only considered whentotal_time_limit
is set to None. Default is set to 120 seconds. - algorithms - the list of algorithms that will be checked. Default is set to ["CatBoost", "Xgboost", "RF", "LightGBM", "NN"].
- start_random_models - the number of models to check with not so random algorithm. Default is set to 10.
- hill_climbing_steps - number of hill climbing steps used in models tuning. Default is set to 3.
- top_models_to_improve - number of models considered for improvement in each hill climbing step. Default is set to 5.
- train_ensemble - decides if ensemble model is trained at the end of AutoML fit procedure. Default is set to True.
- verbose - controls printouts, Default is set to True.
Development
Installation
git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt
Testing
cd supervised
python -m tests.run_all
Newsletter
Don't miss updates and news from us. Subscribe to newsletter!
Roadmap
The package is under active development! Please expect a lot of changes! For this package the graphical interface will be provided soon (also open source!). Please be tuned.
To be added:
- training single decision tree
- create text report from trained models (maybe with plots from learning)
- compute threshold for model prediction and predicting discrete output (label)
- add model/predictions explanations
- add support for multiclass classification
- add support for regressions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file mljar-supervised-0.1.6.tar.gz
.
File metadata
- Download URL: mljar-supervised-0.1.6.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f9eb92c483656ccaa7d350c28fe9d125f9c9908818dc08cce3524d43232c1f3 |
|
MD5 | 52804f6dcff86f8e305f817750d0a300 |
|
BLAKE2b-256 | 841b3c51a08f2961e4cce740901325ed6a761a729a364488adc9573d25566609 |