Skip to main content

Lightning HPO

Project description

Research Studio App

The Research Studio App is a full-stack AI application built using the Lightning App framework to enable running experiments or sweeps with state-of-the-art sampling hyper-parameters algorithms and efficient experiment pruning strategies and more.

Learn more here.


Installation

Create a new virtual environment with python 3.8+

python -m venv .venv
source .venv/bin/activate

Clone and install lightning-hpo.

git clone https://github.com/Lightning-AI/lightning-hpo && cd lightning-hpo

pip install -e . -r requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html

Make sure everything works fine.

python -m lightning run app app.py

Check the documentation to learn more !


Run the Research Studio App locally

In your first terminal, run the Lightning App.

python -m lightning run app app.py.

In second terminal, connect to the Lightning App and download its CLI.

python -m lightning connect localhost -y
python -m lightning --help

Usage: lightning [OPTIONS] COMMAND [ARGS]...

  --help     Show this message and exit.

Lightning App Commands
  create drive       Create a Drive.
  delete drive       Delete a Drive.
  delete experiment  Delete an Experiment.
  delete sweep       Delete a Sweep.
  download artifacts Download an artifact.
  run experiment     Run an Experiment.
  run sweep          Run a Sweep.
  show artifacts     Show artifacts.
  show drives        Show Drives.
  show experiments   Show Experiments.
  show sweeps        Show all Sweeps or the Experiments from a given Sweep.
  stop experiment    Stop an Experiment.
  stop sweep         Stop a Sweep.

You are connected to the local Lightning App. Return to the primary CLI with `lightning disconnect`.

Run your first Sweep from sweep_examples/scripts folder

lightning run sweep train.py --model.lr "[0.001, 0.01, 0.1]" --data.batch "[32, 64]" --algorithm="grid_search" --requirements 'jsonargparse[signatures]>=4.15.2'

Scale by running the Research Studio App in the Cloud

Below, we are about to train a 1B+ LLM Model with multi-node.

python -m lightning run app app.py --cloud

Connect to the App once ready.

python -m lightning connect {APP_NAME} -y

Find below an example with a 1.6B parameter GPT2 transformer model using Lightning Transformers and DeepSpeed using the Lightning Transformers library.

import pytorch_lightning as pl
from lightning_transformers.task.nlp.language_modeling import LanguageModelingDataModule, LanguageModelingTransformer
from transformers import AutoTokenizer

model_name = "gpt2-xl"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = LanguageModelingTransformer(
    pretrained_model_name_or_path=model_name,
    tokenizer=tokenizer,
    deepspeed_sharding=True,
)

dm = LanguageModelingDataModule(
    batch_size=1,
    dataset_name="wikitext",
    dataset_config_name="wikitext-2-raw-v1",
    tokenizer=tokenizer,
)
trainer = pl.Trainer(
    accelerator="gpu",
    devices="auto",
    strategy="deepspeed_stage_3",
    precision=16,
    max_epochs=1,
)

trainer.fit(model, dm)

Run your first multi node training experiment from sweep_examples/scripts folder (2 nodes of 4 V100 GPUS each).

python -m lightning run experiment big_model.py --requirements deepspeed lightning-transformers==0.2.3 --num_nodes=2 --cloud_compute=gpu-fast-multi --disk_size=80

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_hpo-0.0.4.tar.gz (1.0 MB view hashes)

Uploaded Source

Built Distribution

lightning_hpo-0.0.4-py3-none-any.whl (1.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page