Fast and customizable framework for automatic ML model creation (AutoML)

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence
Typing
- Typed

Project description

LightAutoML - automatic model creation framework

PyPI - Downloads Read the Docs Poetry-Lock

LightAutoML (LAMA) is an AutoML framework which provides automatic model creation for the following tasks:

binary classification
multiclass classification
regression

Current version of the package handles datasets that have independent samples in each row. I.e. each row is an object with its specific features and target. Multitable datasets and sequences are a work in progress :)

Note: we use AutoWoE library to automatically create interpretable models.

Authors: Alexander Ryzhkov, Anton Vakhrushev, Dmitry Simakov, Vasilii Bunakov, Rinchin Damdinov, Alexander Kirilin, Pavel Shvets.

Documentation of LightAutoML is available here, you can also generate it.

(New features) GPU and Spark pipelines

Full GPU and Spark pipelines for LightAutoML currently available for developers testing (still in progress). The code and tutorials for:

GPU pipeline is available here
Spark pipeline is available here

Installation LightAutoML from PyPI
Quick tour
Resources
Contributing to LightAutoML
License
For developers
Support and feature requests

Installation

To install LAMA framework on your machine from PyPI, execute following commands:

# Install base functionality:

pip install -U lightautoml

# For partial installation use corresponding option.
# Extra dependecies: [nlp, cv, report]
# Or you can use 'all' to install everything

pip install -U lightautoml[nlp]

Additionaly, run following commands to enable pdf report generation:

# MacOS
brew install cairo pango gdk-pixbuf libffi

# Debian / Ubuntu
sudo apt-get install build-essential libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev shared-mime-info

# Fedora
sudo yum install redhat-rpm-config libffi-devel cairo pango gdk-pixbuf2

# Windows
# follow this tutorial https://weasyprint.readthedocs.io/en/stable/install.html#windows

Quick tour

Let's solve the popular Kaggle Titanic competition below. There are two main ways to solve machine learning problems using LightAutoML:

Use ready preset for tabular data:

import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

automl = TabularAutoML(
    task = Task(
        name = 'binary',
        metric = lambda y_true, y_pred: f1_score(y_true, (y_pred > 0.5)*1))
)
oof_pred = automl.fit_predict(
    df_train,
    roles = {'target': 'Survived', 'drop': ['PassengerId']}
)
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

LighAutoML framework has a lot of ready-to-use parts and extensive customization options, to learn more check out the resources section.

Resources

Kaggle kernel examples of LightAutoML usage:

Google Colab tutorials and other examples:

Tutorial_1_basics.ipynb - get started with LightAutoML on tabular data.
Tutorial_2_WhiteBox_AutoWoE.ipynb - creating interpretable models.
Tutorial_3_sql_data_source.ipynb - shows how to use LightAutoML presets (both standalone and time utilized variants) for solving ML tasks on tabular data from SQL data base instead of CSV.
Tutorial_4_NLP_Interpretation.ipynb - example of using TabularNLPAutoML preset, LimeTextExplainer.
Tutorial_5_uplift.ipynb - shows how to use LightAutoML for a uplift-modeling task.
Tutorial_6_custom_pipeline.ipynb - shows how to create your own pipeline from specified blocks: pipelines for feature generation and feature selection, ML algorithms, hyperparameter optimization etc.
Tutorial_7_ICE_and_PDP_interpretation.ipynb - shows how to obtain local and global interpretation of model results using ICE and PDP approaches.
Tutorial_8_CV_preset.ipynb - example of using TabularCVAutoML preset in CV multi-class classification task.
Tutorial_9_neural_networks.ipynb - example of using Tabular preset with neural networks.
Tutorial_10_relational_data_with_star_scheme.ipynb - example of using Tabular preset with neural networks.

Note 1: for production you have no need to use profiler (which increase work time and memory consomption), so please do not turn it on - it is in off state by default

Note 2: to take a look at this report after the run, please comment last line of demo with report deletion command.

Courses, videos and papers

LightAutoML crash courses:
- (Russian) AutoML course for OpenDataScience community
Video guides:
- (Russian) LightAutoML webinar for Sberloga community (Alexander Ryzhkov, Dmitry Simakov)
- (Russian) LightAutoML hands-on tutorial in Kaggle Kernels (Alexander Ryzhkov)
- (English) Automated Machine Learning with LightAutoML: theory and practice (Alexander Ryzhkov)
- (English) LightAutoML framework general overview, benchmarks and advantages for business (Alexander Ryzhkov)
- (English) LightAutoML practical guide - ML pipeline presets overview (Dmitry Simakov)
Papers:
- Anton Vakhrushev, Alexander Ryzhkov, Dmitry Simakov, Rinchin Damdinov, Maxim Savchenko, Alexander Tuzhilin "LightAutoML: AutoML Solution for a Large Financial Services Ecosystem". arXiv:2109.01528, 2021.
Articles about LightAutoML:
- (English) LightAutoML vs Titanic: 80% accuracy in several lines of code (Medium)
- (English) Hands-On Python Guide to LightAutoML – An Automatic ML Model Creation Framework (Analytic Indian Mag)

Contributing to LightAutoML

If you are interested in contributing to LightAutoML, please read the Contributing Guide to get started.

License

This project is licensed under the Apache License, Version 2.0. See LICENSE file for more details.

For developers

Build your own custom pipeline:

import pandas as pd
from sklearn.metrics import f1_score

from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

df_train = pd.read_csv('../input/titanic/train.csv')
df_test = pd.read_csv('../input/titanic/test.csv')

# define that machine learning problem is binary classification
task = Task("binary")

reader = PandasToPandasReader(task, cv=N_FOLDS, random_state=RANDOM_STATE)

# create a feature selector
model0 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'seed': 42, 'num_threads': N_THREADS}
)
pipe0 = LGBSimpleFeatures()
mbie = ModelBasedImportanceEstimator()
selector = ImportanceCutoffSelector(pipe0, model0, mbie, cutoff=0)

# build first level pipeline for AutoML
pipe = LGBSimpleFeatures()
# stop after 20 iterations or after 30 seconds
params_tuner1 = OptunaTuner(n_trials=20, timeout=30)
model1 = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 128,
    'seed': 1, 'num_threads': N_THREADS}
)
model2 = BoostLGBM(
    default_params={'learning_rate': 0.025, 'num_leaves': 64,
    'seed': 2, 'num_threads': N_THREADS}
)
pipeline_lvl1 = MLPipeline([
    (model1, params_tuner1),
    model2
], pre_selection=selector, features_pipeline=pipe, post_selection=None)

# build second level pipeline for AutoML
pipe1 = LGBSimpleFeatures()
model = BoostLGBM(
    default_params={'learning_rate': 0.05, 'num_leaves': 64,
    'max_bin': 1024, 'seed': 3, 'num_threads': N_THREADS},
    freeze_defaults=True
)
pipeline_lvl2 = MLPipeline([model], pre_selection=None, features_pipeline=pipe1,
 post_selection=None)

# build AutoML pipeline
automl = AutoML(reader, [
    [pipeline_lvl1],
    [pipeline_lvl2],
], skip_conn=False)

# train AutoML and get predictions
oof_pred = automl.fit_predict(df_train, roles = {'target': 'Survived', 'drop': ['PassengerId']})
test_pred = automl.predict(df_test)

pd.DataFrame({
    'PassengerId':df_test.PassengerId,
    'Survived': (test_pred.data[:, 0] > 0.5)*1
}).to_csv('submit.csv', index = False)

Support and feature requests

Seek prompt advice at Telegram group.

Open bug reports and feature requests on GitHub issues.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence
Typing
- Typed

Release history Release notifications | RSS feed

0.4.3.dev1 pre-release

Jan 15, 2026

0.4.2

Dec 4, 2025

0.4.2.dev1 pre-release

Nov 29, 2025

0.4.1

Mar 5, 2025

0.4.1.dev1 pre-release yanked

Dec 25, 2024

0.4.0 yanked

Dec 22, 2024

0.4.0.dev2 pre-release

Dec 19, 2024

0.4.0.dev1 pre-release

Aug 27, 2024

0.3.8.1

Jan 22, 2024

0.3.8

Jan 12, 2024

This version

0.3.8b1 pre-release

Jul 26, 2023

0.3.7.3

Nov 7, 2022

0.3.7.2

Nov 3, 2022

0.3.7.1

Sep 20, 2022

0.3.7

Sep 8, 2022

0.3.6

Jul 4, 2022

0.3.5

Jun 27, 2022

0.3.4

Apr 20, 2022

0.3.3

Feb 3, 2022

0.3.2

Dec 8, 2021

0.3.1

Oct 29, 2021

0.3.0

Oct 15, 2021

0.2.16 yanked

Jul 6, 2021

0.2.15 yanked

Jun 14, 2021

0.2.14 yanked

May 12, 2021

0.2.13

Apr 29, 2021

0.2.12

Apr 20, 2021

0.2.11

Apr 13, 2021

0.2.10

Feb 17, 2021

0.2.8

Jan 11, 2021

0.2.7

Dec 9, 2020

0.2.6

Dec 8, 2020

0.2.5

Dec 4, 2020

0.2.4

Dec 4, 2020

0.2.3

Dec 4, 2020

0.2.2

Dec 4, 2020

0.2.1

Dec 3, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightautoml-0.3.8b1.tar.gz (285.9 kB view details)

Uploaded Jul 26, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lightautoml-0.3.8b1-py3-none-any.whl (382.3 kB view details)

Uploaded Jul 26, 2023 Python 3

File details

Details for the file lightautoml-0.3.8b1.tar.gz.

File metadata

Download URL: lightautoml-0.3.8b1.tar.gz
Upload date: Jul 26, 2023
Size: 285.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.0 CPython/3.8.5 Linux/4.15.0-143-generic

File hashes

Hashes for lightautoml-0.3.8b1.tar.gz
Algorithm	Hash digest
SHA256	`269a69d5fab51940d4d5682e887306ad3527bd4a771f978e3eee728d0482129a`
MD5	`cccd72817d52efe5ca6c6fd16bc7cf42`
BLAKE2b-256	`f67e4e64d07044cc679f74cf3d24be11a6917c36aa83deaefe11860fbd93abf9`

See more details on using hashes here.

File details

Details for the file lightautoml-0.3.8b1-py3-none-any.whl.

File metadata

Download URL: lightautoml-0.3.8b1-py3-none-any.whl
Upload date: Jul 26, 2023
Size: 382.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.0 CPython/3.8.5 Linux/4.15.0-143-generic

File hashes

Hashes for lightautoml-0.3.8b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80f233094b5d3a8ff092b33645d852b5f4b4c70b112670ef8b6e8723d7bc3d40`
MD5	`af262f61e19fcb7e9e38e65bf5969efe`
BLAKE2b-256	`ff4de4e181da6cfa5326ca564b68f4b3878062f694aa4027099d002ba152139e`

See more details on using hashes here.

LightAutoML 0.3.8b1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LightAutoML - automatic model creation framework

(New features) GPU and Spark pipelines

Table of Contents

Installation

Quick tour

Resources

Kaggle kernel examples of LightAutoML usage:

Google Colab tutorials and other examples:

Courses, videos and papers

Contributing to LightAutoML

License

For developers

Build your own custom pipeline:

Support and feature requests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes