Skip to main content

Automated Machine Learning for Supervised tasks

Project description

mljar-supervised

Build Status PyPI version PyPI pyversions

Automated Machine Learning

mljar-supervised is an Automated Machine Learning python package. It can train ML models for:

  • binary classification,
  • multi-class classification,
  • regression.

What's good in it?

  • mljar-supervised creates markdown reports from AutoML training. The example of AutoML leaderboard summary:

AutoML leaderboard

The example for Decision Tree summary: Decision Tree summary

The example for LightGBM summary: Decision Tree summary

  • This package is computing Baseline for your data. So you will know if you need Machine Learning or not! You will know how good are your ML models comparing to the Baseline. The Baseline is computed based on prior class distribution for classification, and simple mean for regression.
  • This package is training simple Decision Trees with max_depth <= 5, so you can easily visualize them with amazing dtreeviz to better understand your data.
  • The mljar-supervised is using simple linear regression and include its coefficients in the summary report, so you can check which features are used the most in the linear model.
  • It is using a vast set of algorithms: Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost (Neural Networks will be added soon).
  • It can do features preprocessing, like: missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing (You won't believe how often it is needed!). For example, converting categorical target into numeric.
  • It can tune hyper-parameters with not-so-random-search algorithm (random-search over defined set of values) and hill climbing to fine-tune final models.
  • It can compute Ensemble based on greedy algorithm from Caruana paper.
  • It cares about explainability of models: for every algorithm, the feature importance is computed based on permutation. Additionally, for every algorithm the SHAP explanations are computed: feature importance, dependence plots, and decision plots (explanations can be switched off with explain_level parameter).

Quick example

There is a simple interface available with fit and predict methods.

import pandas as pd
from supervised.automl import AutoML

df = pd.read_csv("https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True)

X = df[df.columns[:-1]]
y = df["income"]

automl = AutoML(results_path="directory_with_reports")
automl.fit(X, y)

predictions = automl.predict(X)

For details please check AutoML API Docs.

Examples

Installation

From PyPi repository:

pip install mljar-supervised

From source code:

git clone https://github.com/mljar/mljar-supervised.git
cd mljar-supervised
python setup.py install

Installation for development

git clone https://github.com/mljar/mljar-supervised.git
virtualenv venv --python=python3.6
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements_dev.txt

MLJAR

The mljar-supervised is an open-source project created by MLJAR. We care about ease of use in the Machine Learning. The mljar.com provides a beautiful and simple user interface for building machine learning models.

Project details


Release history Release notifications | RSS feed

This version

0.4.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mljar-supervised-0.4.0.tar.gz (52.0 kB view details)

Uploaded Source

File details

Details for the file mljar-supervised-0.4.0.tar.gz.

File metadata

  • Download URL: mljar-supervised-0.4.0.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for mljar-supervised-0.4.0.tar.gz
Algorithm Hash digest
SHA256 12b9ac5f26a5d90ec0e1123d316b604ea3f7d171f3153d522039e42e72ba78a2
MD5 809eeb807c4fffc8cf351668686b3de7
BLAKE2b-256 7c5a0c5b0052e24c8df5a9a99b833e88991ac149aadda29e98ddb17f87c8ab49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page