Distributed Hyperparameter Optimization on SageMaker
Project description
Syne Tune: Large-Scale and Reproducible Hyperparameter Optimization
Documentation | Tutorials | API Reference | PyPI | Latest Blog Post
Syne Tune provides state-of-the-art algorithms for hyperparameter optimization (HPO) with the following key features:
- Lightweight and platform-agnostic: Syne Tune is designed to work with different execution backends, so you are not locked into a particular distributed system architecture. Syne Tune runs with minimal dependencies.
- Wide coverage of different HPO methods: Syne Tune supports more than 20 different optimization methods across multi-fidelity HPO, constrained HPO, multi-objective HPO, transfer learning, cost-aware HPO, and population-based training.
- Simple, modular design: Rather than wrapping other HPO frameworks, Syne Tune provides simple APIs and scheduler templates, which can easily be extended to your specific needs. Studying the code will allow you to understand what the different algorithms are doing, and how they differ from each other.
- Industry-strength Bayesian optimization: Syne Tune has comprehensive support for Gaussian Process-based Bayesian optimization. The same code powers modalities such as multi-fidelity HPO, constrained HPO, and cost-aware HPO, and has been tried and tested in production for several years.
- Support for distributed workloads: Syne Tune lets you move fast, thanks to the parallel compute resources AWS SageMaker offers. Syne Tune allows ML/AI practitioners to easily set up and run studies with many experiments running in parallel. Run on different compute environments (locally, AWS, simulation) by changing just one line of code.
- Out-of-the-box tabulated benchmarks: Tabulated benchmarks let you simulate results in seconds while preserving the real dynamics of asynchronous or synchronous HPO with any number of workers.
Syne Tune is developed in collaboration with the team behind the Automatic Model Tuning service.
Installing
To install Syne Tune from pip, you can simply do:
pip install 'syne-tune[basic]'
or to install the latest version from source:
git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic]'
This installs everything in a virtual environment st_venv
. Remember to activate
this environment before working with Syne Tune. We also recommend building the
virtual environment from scratch now and then, in particular when you pull a new
release, as dependencies may have changed.
See our change log to see what changed in the latest version.
Getting started
To enable tuning, you have to report metrics from a training script so that they can be communicated later to Syne Tune,
this can be accomplished by just calling report(epoch=epoch, loss=loss)
as shown in the example below:
# train_height_simple.py
import logging
import time
from syne_tune import Reporter
from argparse import ArgumentParser
if __name__ == '__main__':
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument('--epochs', type=int)
parser.add_argument('--width', type=float)
parser.add_argument('--height', type=float)
args, _ = parser.parse_known_args()
report = Reporter()
for step in range(args.epochs):
time.sleep(0.1)
dummy_score = 1.0 / (0.1 + args.width * step / 100) + args.height * 0.1
# Feed the score back to Syne Tune.
report(epoch=step + 1, mean_loss=dummy_score)
Once you have a training script reporting a metric, you can launch a tuning as follows:
# launch_height_simple.py
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import ASHA
# hyperparameter search space to consider
config_space = {
'width': randint(1, 20),
'height': randint(1, 20),
'epochs': 100,
}
tuner = Tuner(
trial_backend=LocalBackend(entry_point='train_height_simple.py'),
scheduler=ASHA(
config_space,
metric='mean_loss',
resource_attr='epoch',
max_resource_attr="epochs",
search_options={'debug_log': False},
),
stop_criterion=StoppingCriterion(max_wallclock_time=30),
n_workers=4, # how many trials are evaluated in parallel
)
tuner.run()
The above example runs ASHA with 4 asynchronous workers on a local machine.
Experimentation with Syne Tune
If you plan to use advanced features of Syne Tune, such as different execution
backends or running experiments remotely, writing launcher scripts like
examples/launch_height_simple.py
can become tedious. Syne Tune provides an
advanced experimentation framework, which you can learn about in
this tutorial
or also in
this one.
Supported HPO methods
The following hyperparameter optimization (HPO) methods are available in Syne Tune:
Method | Reference | Searcher | Asynchronous? | Multi-fidelity? | Transfer? |
---|---|---|---|---|---|
Grid Search | deterministic | yes | no | no | |
Random Search | Bergstra, et al. (2011) | random | yes | no | no |
Bayesian Optimization | Snoek, et al. (2012) | model-based | yes | no | no |
BORE | Tiao, et al. (2021) | model-based | yes | no | no |
CQR | Salinas, et al. (2023) | model-based | yes | no | no |
MedianStoppingRule | Golovin, et al. (2017) | any | yes | yes | no |
SyncHyperband | Li, et al. (2018) | random | no | yes | no |
SyncBOHB | Falkner, et al. (2018) | model-based | no | yes | no |
SyncMOBSTER | Klein, et al. (2020) | model-based | no | yes | no |
ASHA | Li, et al. (2019) | random | yes | yes | no |
BOHB | Falkner, et al. (2018) | model-based | yes | yes | no |
MOBSTER | Klein, et al. (2020) | model-based | yes | yes | no |
DEHB | Awad, et al. (2021) | evolutionary | no | yes | no |
HyperTune | Li, et al. (2022) | model-based | yes | yes | no |
DyHPO* | Wistuba, et al. (2022) | model-based | yes | yes | no |
ASHABORE | Tiao, et al. (2021) | model-based | yes | yes | no |
ASHACQR | Salinas, et al. (2023) | model-based | yes | yes | no |
PASHA | Bohdal, et al. (2022) | random or model-based | yes | yes | no |
REA | Real, et al. (2019) | evolutionary | yes | no | no |
KDE | Falkner, et al. (2018) | model-based | yes | no | no |
PBT | Jaderberg, et al. (2017) | evolutionary | no | yes | no |
ZeroShotTransfer | Wistuba, et al. (2015) | deterministic | yes | no | yes |
ASHA-CTS | Salinas, et al. (2021) | random | yes | yes | yes |
RUSH | Zappella, et al. (2021) | random | yes | yes | yes |
BoundingBox | Perrone, et al. (2019) | any | yes | yes | yes |
*: We implement the model-based scheduling logic of DyHPO, but use the same Gaussian process surrogate models as MOBSTER and HyperTune. The original source code for the paper is here.
The searchers fall into four broad categories, deterministic, random, evolutionary and model-based. The random searchers sample candidate hyperparameter configurations uniformly at random, while the model-based searchers sample them non-uniformly at random, according to a model (e.g., Gaussian process, density ration estimator, etc.) and an acquisition function. The evolutionary searchers make use of an evolutionary algorithm.
Syne Tune also supports BoTorch searchers.
Supported multi-objective optimization methods
Method | Reference | Searcher | Asynchronous? | Multi-fidelity? | Transfer? |
---|---|---|---|---|---|
Constrained Bayesian Optimization | Gardner, et al. (2014) | model-based | yes | no | no |
MOASHA | Schmucker, et al. (2021) | random | yes | yes | no |
NSGA-2 | Deb, et al. (2002) | evolutionary | no | no | no |
Multi Objective Multi Surrogate (MSMOS) | Guerrero-Viu, et al. (2021) | model-based | no | no | no |
MSMOS wihh random scalarization | Paria, et al. (2018) | model-based | no | no | no |
HPO methods listed can be used in a multi-objective setting by scalarization or non-dominated sorting. See multiobjective_priority.py for details.
Examples
You will find many examples in the examples/ folder illustrating different functionalities provided by Syne Tune. For example:
- launch_height_baselines.py: launches HPO locally, tuning a simple script train_height_example.py for several baselines
- launch_height_moasha.py: shows how to tune a script reporting multiple-objectives with multiobjective Asynchronous Hyperband (MOASHA)
- launch_height_standalone_scheduler.py: launches HPO locally with a custom scheduler that cuts any trial that is not in the top 80%
- launch_height_sagemaker_remotely.py: launches the HPO loop on SageMaker rather than a local machine, trial can be executed either the remote machine or distributed again as separate SageMaker training jobs. See launch_height_sagemaker_remote_launcher.py for remote launching with the help of RemoteTuner also discussed in one of the FAQs.
- launch_height_sagemaker.py: launches HPO on SageMaker to tune a SageMaker Pytorch estimator
- launch_bayesopt_constrained.py: launches Bayesian constrained hyperparameter optimization
- launch_height_sagemaker_custom_image.py: launches HPO on SageMaker to tune an entry point with a custom docker image
- launch_plot_results.py: shows how to plot results of a HPO experiment
- launch_tensorboard_example.py: shows how results can be visualized on the fly with TensorBoard
- launch_nasbench201_simulated.py: demonstrates simulation of experiments on a tabulated benchmark
- launch_fashionmnist.py: launches HPO locally tuning a multi-layer perceptron on Fashion MNIST. This employs an easy-to-use benchmark convention
- launch_huggingface_classification.py: launches HPO on SageMaker to tune a SageMaker Hugging Face estimator for sentiment classification
- launch_tuning_gluonts.py: launches HPO locally to tune a gluon-ts time series forecasting algorithm
- launch_rl_tuning.py: launches HPO locally to tune a RL algorithm on the cartpole environment
- launch_height_ray.py: launches HPO locally with Ray Tune scheduler
Examples for Experimentation and Benchmarking
You will find many examples for experimentation and benchmarking in benchmarking/examples/ and in benchmarking/nursery/.
FAQ and Tutorials
You can check our FAQ, to learn more about Syne Tune functionalities.
- Why should I use Syne Tune?
- What are the different installations options supported?
- How can I run on AWS and SageMaker?
- What are the metrics reported by default when calling the
Reporter
? - How can I utilize multiple GPUs?
- What is the default mode when performing optimization?
- How are trials evaluated on a local machine?
- Where can I find the output of the tuning?
- How can I change the default output folder where tuning results are stored?
- What does the output of the tuning contain?
- How can I enable trial checkpointing?
- How can I retrieve the best checkpoint obtained after tuning?
- Which schedulers make use of checkpointing?
- Is the tuner checkpointed?
- Where can I find the output of my trials?
- How can I plot the results of a tuning?
- How can I specify additional tuning metadata?
- How do I append additional information to the results which are stored?
- I don’t want to wait, how can I launch the tuning on a remote machine?
- How can I run many experiments in parallel?
- How can I access results after tuning remotely?
- How can I specify dependencies to remote launcher or when using the SageMaker backend?
- How can I benchmark different methods?
- What different schedulers do you support? What are the main differences between them?
- How do I define the configuration space?
- How do I set arguments of multi-fidelity schedulers?
- How can I visualize the progress of my tuning experiment with Tensorboard?
- How can I add a new scheduler?
- How can I add a new tabular or surrogate benchmark?
- How can I reduce delays in starting trials with the SageMaker backend?
- How can I pass lists or dictionaries to the training script?
- How can I write extra results for an experiment?
Do you want to know more? Here are a number of tutorials.
- Basics of Syne Tune
- Choosing a Configuration Space
- Using the Built-in Schedulers
- Multi-Fidelity Hyperparameter Optimization
- Benchmarking in Syne Tune
- Visualization of Results
- Rapid Experimentation with Syne Tune
- How to Contribute a New Scheduler
- PASHA: Efficient HPO and NAS with Progressive Resource Allocation
- Using Syne Tune for Transfer Learning
- Distributed Hyperparameter Tuning: Finding the Right Model can be Fast and Fun
Blog Posts
- Run distributed hyperparameter and neural architecture tuning jobs with Syne Tune
- Hyperparameter optimization for fine-tuning pre-trained transformer models from Hugging Face (notebook)
- Learn Amazon Simple Storage Service transfer configuration with Syne Tune (code)
Videos
Security
See CONTRIBUTING for more information.
Citing Syne Tune
If you use Syne Tune in a scientific publication, please cite the following paper:
"Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research" First Conference on Automated Machine Learning, 2022.
@inproceedings{
salinas2022syne,
title={Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research},
author={David Salinas and Matthias Seeger and Aaron Klein and Valerio Perrone and Martin Wistuba and Cedric Archambeau},
booktitle={International Conference on Automated Machine Learning, AutoML 2022},
year={2022},
url={https://proceedings.mlr.press/v188/salinas22a.html}
}
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file syne_tune-0.13.0.tar.gz
.
File metadata
- Download URL: syne_tune-0.13.0.tar.gz
- Upload date:
- Size: 516.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e432b14cf7b337fafb50c84dc407c835568b715591fa11fc67b7ace29441ea5 |
|
MD5 | 4db7bf2a03e6b015b01fb61ccceefa0a |
|
BLAKE2b-256 | 38336dfe541a2e03468eb865cb70b9d406aba0c23436f08388cf0778bc3649ef |
File details
Details for the file syne_tune-0.13.0-py3-none-any.whl
.
File metadata
- Download URL: syne_tune-0.13.0-py3-none-any.whl
- Upload date:
- Size: 754.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e5538b434768177529cfbb7c475233f165bbeb2b52134ee05ed2297587c968 |
|
MD5 | 55eb5f477320b982dfb071e568353ebe |
|
BLAKE2b-256 | 83fbeb2d27c48d5943fa66993b4806dc2ea1c87730409be91f424ef8a14765b2 |