Automated Machine Learning for Supervised tasks
Project description
mljar-supervised
Automated Machine Learning
mljar-supervised
is an Automated Machine Learning python package. It can train ML models for:
- binary classification,
- multi-class classification,
- regression.
What's good in it?
mljar-supervised
creates markdown reports from AutoML training. The example of AutoML leaderboard summary:
The example for Decision Tree
summary:
The example for LightGBM
summary:
- This package is computing
Baseline
for your data. So you will know if you need Machine Learning or not! You will know how good are your ML models comparing to theBaseline
. TheBaseline
is computed based on prior class distribution for classification, and simple mean for regression. - This package is training simple
Decision Trees
withmax_depth <= 5
, so you can easily visualize them with amazing dtreeviz to better understand your data. - The
mljar-supervised
is using simple linear regression and include its coefficients in the summary report, so you can check which features are used the most in the linear model. - It is using a vast set of algorithms:
Random Forest
,Extra Trees
,LightGBM
,Xgboost
,CatBoost
(Neural Networks
will be added soon). - It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing (You won't believe how often it is needed!). For example, converting categorical target into numeric.
- It can tune hyper-parameters with
not-so-random-search
algorithm (random-search over defined set of values) and hill climbing to fine-tune final models. - It can compute Ensemble based on greedy algorithm from Caruana paper.
- It cares about explainability of models: for every algorithm, the feature importance is computed based on permutation. Additionally, for every algorithm the SHAP explanations are computed: feature importance, dependence plots, and decision plots (explanations can be switched off with
explain_level
parameter).
Quick example
There is a simple interface available with fit
and predict
methods.
import pandas as pd
from supervised.automl import AutoML
df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True)
X = df[df.columns[:-1]]
y = df["income"]
automl = AutoML(results_path="directory_with_reports")
automl.fit(X, y)
predictions = automl.predict(X)
For details please check AutoML API Docs.
Examples
- Income classification - it is a binary classification task on census data
- Iris classification - it is a multiclass classification on Iris flowers data
- House price regression - it is a regression task on Boston houses data
Installation
From PyPi repository:
pip install mljar-supervised
From source code:
git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install
Installation for development
git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements_dev.txt
MLJAR
The mljar-supervised
is an open-source project created by MLJAR. We care about ease of use in the Machine Learning.
The mljar.com provides a beautiful and simple user interface for building machine learning models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mljar-supervised-0.4.0.tar.gz
(52.0 kB
view hashes)