Skip to main content

Fast and customizable framework for automatic ML model creation (AutoML)

Project description

SLAMA: LightAutoML on Spark

SLAMA is a version of LightAutoML library modified to run in distributed mode with Apache Spark framework.

It requires:

  1. Python 3.9
  2. PySpark 3.2+ (installed as a dependency)
  3. Synapse ML library (It will be downloaded by Spark automatically)

Currently, only tabular Preset is supported. See demo with spark-based tabular automl preset in examples/spark/tabular-preset-automl.py. For further information check docs in the root of the project containing dedicated SLAMA section.

License

This project is licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Installation

  1. First of all you need to install git and poetry.

  2. Clone repo and install all dependencies

# Load SLAMA source code
git clone https://github.com/sb-ai-lab/SLAMA.git

cd SLAMA/

# !!!Choose only one item!!!

# Create virtual environment inside your project directory
poetry config virtualenvs.in-project true

# For more information read poetry docs

# Install SLAMA
poetry install
  1. Install SLAMA jars
  • Download the jar when starting the spark session:
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SLAMA") \
    .config("spark.jars.repositories", "https://oss.sonatype.org/content/repositories/releases") \
    .config("spark.jars.packages", "io.github.sb-ai-lab:spark-lightautoml_2.12:0.1") \
    .getOrCreate()
...
  • Or download the lastest jar and add it localy:
from pyspark.sql import SparkSession

spark = SparkSession \
    .builder \
    .appName("SLAMA") \
    .config("spark.jars.packages", "JAR_DIR/spark-lightautoml_2.12-0.1.jar") \
    .getOrCreate()
...

Сonfiguring the cluster

You can find information about setting up different types of clusters to use the code in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SparkLightAutoML-0.3.1.tar.gz (113.1 kB view hashes)

Uploaded Source

Built Distribution

SparkLightAutoML-0.3.1-py3-none-any.whl (141.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page