Skip to main content

Fast and customizable framework for automatic ML model creation (AutoML)

Project description

LightAutoML - automatic model creation framework

Telegram PyPI - Downloads Read the Docs Black Poetry-Lock

LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:

  • binary classification
  • multiclass classification
  • regression

Current version of the package handles datasets that have independent samples in each row. I.e. each row is an object with its specific features and target. Multitable datasets and sequences are a work in progress :)

Note: we use AutoWoE library to automatically create interpretable models.

Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets.

Documentation of LightAutoML is available here, you can also generate it.

(New features) GPU and Spark pipelines

Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:

Table of Contents

Installation

To install LAMA framework on your machine from PyPI, execute following commands:

# Install base functionality:

pip install -U lightautoml

# For partial installation use corresponding option.
# Extra dependecies: [nlp, cv, report]
# Or you can use 'all' to install everything

pip install -U lightautoml[nlp]

Additionaly, run following commands to enable pdf report generation:

# MacOS
brew install cairo pango gdk-pixbuf libffi

# Debian / Ubuntu
sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

# Fedora
sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2

# Windows
# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows

Back to top

Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:

  • Use ready preset for tabular data:
import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

automl = TabularAutoML(
    task = Task(
        name = 'binary',
        metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1))
)
oof_pred = automl.fit_predict(
    df_train,
    roles = {'target': 'Survived', 'drop': ['PassengerId']}
)
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.

Back to top

Resources

Kaggle kernel examples of LightAutoML usage:

Google Colab tutorials and other examples:

Note 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Note 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

Courses, videos and papers

Back to top

Contributing to LightAutoML

If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.

Back to top

License

This project is licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Back to top

For developers

Build your own custom pipeline:

import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

# define that machine learning problem is binary classification
task = Task("binary")

reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)

# create a feature selector
model0 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'seed': 42, 'num_threads': N_THREADS}
)
pipe0 = LGBSimpleFeatures()
mbie = ModelBasedImportanceEstimator()
selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)

# build first level pipeline for AutoML
pipe = LGBSimpleFeatures()
# stop after 20 iterations or after 30 seconds
params_tuner1 = OptunaTuner(n_trials=20, timeout=30)
model1 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 128,
    'seed': 1, 'num_threads': N_THREADS}
)
model2 = BoostLGBM(
    default_params={'learning_rate': 0.025, 'num_leaves': 64,
    'seed': 2, 'num_threads': N_THREADS}
)
pipeline_lvl1 = MLPipeline([
    (model1, params_tuner1),
    model2
], pre_selection=selector, features_pipeline=pipe, post_selection=None)

# build second level pipeline for AutoML
pipe1 = LGBSimpleFeatures()
model = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
    freeze_defaults=True
)
pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1,
 post_selection=None)

# build AutoML pipeline
automl = AutoML(reader, [
    [pipeline_lvl1],
    [pipeline_lvl2],
], skip_conn=False)

# train AutoML and get predictions
oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

Back to top

Support and feature requests

Seek prompt advice at Telegram group.

Open bug reports and feature requests on GitHub issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightautoml-0.3.8b1.tar.gz (285.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lightautoml-0.3.8b1-py3-none-any.whl (382.3 kB view details)

Uploaded Python 3

File details

Details for the file lightautoml-0.3.8b1.tar.gz.

File metadata

  • Download URL: lightautoml-0.3.8b1.tar.gz
  • Upload date:
  • Size: 285.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.8.5 Linux/4.15.0-143-generic

File hashes

Hashes for lightautoml-0.3.8b1.tar.gz
Algorithm Hash digest
SHA256 269a69d5fab51940d4d5682e887306ad3527bd4a771f978e3eee728d0482129a
MD5 cccd72817d52efe5ca6c6fd16bc7cf42
BLAKE2b-256 f67e4e64d07044cc679f74cf3d24be11a6917c36aa83deaefe11860fbd93abf9

See more details on using hashes here.

File details

Details for the file lightautoml-0.3.8b1-py3-none-any.whl.

File metadata

  • Download URL: lightautoml-0.3.8b1-py3-none-any.whl
  • Upload date:
  • Size: 382.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.8.5 Linux/4.15.0-143-generic

File hashes

Hashes for lightautoml-0.3.8b1-py3-none-any.whl
Algorithm Hash digest
SHA256 80f233094b5d3a8ff092b33645d852b5f4b4c70b112670ef8b6e8723d7bc3d40
MD5 af262f61e19fcb7e9e38e65bf5969efe
BLAKE2b-256 ff4de4e181da6cfa5326ca564b68f4b3878062f694aa4027099d002ba152139e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page