Skip to main content

Lightning HPO

Project description

Training Studio App

The Training Studio App is a full-stack AI application built using the Lightning framework to enable running experiments or sweeps with state-of-the-art sampling hyper-parameters algorithms and efficient experiment pruning strategies and more.

Learn more here.


Installation

Create a new virtual environment with python 3.8+

python -m venv .venv
source .venv/bin/activate

Clone and install lightning-hpo.

git clone https://github.com/Lightning-AI/lightning-hpo && cd lightning-hpo

pip install -e . -r requirements.txt --find-links https://download.pytorch.org/whl/cpu/torch_stable.html --pre

Make sure everything works fine.

python -m lightning run app app.py

Check the documentation to learn more !


Run the Training Studio App locally

In your first terminal, run the Lightning App.

lightning run app app.py

In second terminal, connect to the Lightning App and download its CLI.

lightning connect localhost --yes
lightning --help

Usage: lightning [OPTIONS] COMMAND [ARGS]...

  --help     Show this message and exit.

Lightning App Commands
Usage: lightning [OPTIONS] COMMAND [ARGS]...

  --help     Show this message and exit.

Lightning App Commands
  create data        Create a Data association by providing a public S3 bucket and an optional mount point.
                     The contents of the bucket can be then mounted on experiments and sweeps and
                     accessed through the filesystem.
  delete data        Delete a data association. Note that this will not delete the data itself,
                     it will only make it unavailable to experiments and sweeps.
  delete experiment  Delete an experiment. Note that artifacts will still be available after the operation.
  delete sweep       Delete a sweep. Note that artifacts will still be available after the operation.
  download artifacts Download artifacts for experiments or sweeps.
  run experiment     Run an experiment by providing a script, the cloud compute type and optional
                     data entries to be made available at a given path.
  run sweep          Run a sweep by providing a script, the cloud compute type and optional
                     data entries to be made available at a given path. Hyperparameters can be
                     provided as lists (`model.lr="[0.01, 0.1]"`) or using distributions
                     (`model.lr="uniform(0.01, 0.1)"`, `model.lr="log_uniform(0.01, 0.1)"`).
                     Hydra multirun override syntax is also supported.
  show artifacts     Show artifacts for experiments or sweeps, in flat or tree layout.
  show data          List all data associations.
  show experiments   Show experiments and their statuses.
  show logs          Show logs of an experiment or a sweep. Optionally follow logs as they stream.
  show sweeps        Show all sweeps and their statuses, or the experiments for a given sweep.
  stop experiment    Stop an experiment. Note that currently experiments cannot be resumed.
  stop sweep         Stop all experiments in a sweep. Note that currently sweeps cannot be resumed.

You are connected to the local Lightning App. Return to the primary CLI with `lightning disconnect`.

Run your first Sweep from sweep_examples/scripts folder

lightning run sweep train.py --model.lr "[0.001, 0.01, 0.1]" --data.batch "[32, 64]" --algorithm="grid_search" --requirements 'jsonargparse[signatures]>=4.15.2'

Scale by running the Training Studio App in the Cloud

Below, we are about to train a 1B+ LLM Model with multi-node.

lightning run app app.py --cloud

Connect to the App once ready.

lightning connect {APP_NAME} --yes

Run your first multi node training experiment from sweep_examples/scripts folder (2 nodes of 4 V100 GPUS each).

lightning run experiment big_model.py --requirements deepspeed lightning-transformers==0.2.5 --num_nodes=2 --cloud_compute=gpu-fast-multi --disk_size=80

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightning_hpo-0.0.7.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

lightning_hpo-0.0.7-py3-none-any.whl (1.1 MB view details)

Uploaded Python 3

File details

Details for the file lightning_hpo-0.0.7.tar.gz.

File metadata

  • Download URL: lightning_hpo-0.0.7.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for lightning_hpo-0.0.7.tar.gz
Algorithm Hash digest
SHA256 8db57965fb9a9a4c0d7fb33c583b7551265a9b6f5ab9ac82e62c99f7b76849a2
MD5 5c11ef462e522b88794906690dd1ee16
BLAKE2b-256 264b0ca4fda52b9db42af52747ce265d0dfcbe958d8e87a6a05f0f6be2cfca05

See more details on using hashes here.

File details

Details for the file lightning_hpo-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for lightning_hpo-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e65f9661a8503ed318ff32252118e5dfdba252cd720b817daf0cf2074786bf3a
MD5 979ccb2e7adf9c9447144197a1ec24b3
BLAKE2b-256 cb91265943fc70294b526fd31d983c8344f1087d1d977f822335979e2b0caed5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page