Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization
Project description
DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization
Installation
# from pypi
pip install dehb
# to run examples, install from github
git clone https://github.com/automl/DEHB.git
pip install -e DEHB # -e stands for editable, lets you modify the code and rerun things
Tutorials/Example notebooks
- 00 - A generic template to use DEHB for multi-fidelity Hyperparameter Optimization
- 01.1 - Using DEHB to optimize 4 hyperparameters of a Scikit-learn's Random Forest on a classification dataset
- 01.2 - Using DEHB to optimize 4 hyperparameters of a Scikit-learn's Random Forest on a classification dataset using Ask & Tell interface
- 02 - Optimizing Scikit-learn's Random Forest without using ConfigSpace to represent the hyperparameter space
- 03 - Hyperparameter Optimization for MNIST in PyTorch
To run PyTorch example: (note additional requirements)
python examples/03_pytorch_mnist_hpo.py \
--min_fidelity 1 \
--max_fidelity 3 \
--runtime 60 \
--verbose
Ask & Tell interface
DEHB allows users to either utilize the Ask & Tell interface for manual task distribution or leverage the built-in functionality (run
) to set up a Dask cluster autonomously.
The Ask & Tell functionality can be utilized as follows:
optimizer = DEHB(
f=your_target_function, # Here we do not need to necessarily specify the target function, but it can still be useful to call 'run' later.
cs=config_space,
dimensions=dimensions,
min_fidelity=min_fidelity,
max_fidelity=max_fidelity)
# Ask for next configuration to run
job_info = optimizer.ask()
# Run the configuration for the given fidelity. Here you can freely distribute the computation to any worker you'd like.
result = your_target_function(config=job_info["config"], fidelity=job_info["fidelity"])
# When you received the result, feed them back to the optimizer
optimizer.tell(job_info, result)
Running DEHB in a parallel setting
DEHB has been designed to interface a Dask client. DEHB can either create a Dask client during instantiation and close/kill the client during garbage collection. Or a client can be passed as an argument during instantiation.
- Setting
n_workers
during instantiation
If set to1
(default) then the entire process is a sequential run without invoking Dask.
If set to>1
then a Dask Client is initialized with as many workers asn_workers
.
This parameter is ignored ifclient
is not None. - Setting
client
during instantiation
WhenNone
(default), a Dask client is created usingn_workers
specified.
Else, any custom-configured Dask Client can be created and passed as theclient
argument to DEHB.
Using GPUs in a parallel run
Certain target function evaluations (especially for Deep Learning) require computations to be carried out on GPUs. The GPU devices are often ordered by device ID and if not configured, all spawned worker processes access these devices in the same order and can either run out of memory or not exhibit parallelism.
For n_workers>1
and when running on a single node (or local), the single_node_with_gpus
can be
passed to the run()
call to DEHB. Setting it to False
(default) has no effect on the default setup
of the machine. Setting it to True
will reorder the GPU device IDs dynamically by setting the environment
variable CUDA_VISIBLE_DEVICES
for each worker process executing a target function evaluation. The re-ordering
is done in a manner that the first priority device is the one with the least number of active jobs assigned
to it by that DEHB run.
To run the PyTorch MNIST example on a single node using 2 workers:
python examples/03_pytorch_mnist_hpo.py \
--min_fidelity 1 \
--max_fidelity 3 \
--runtime 60 \
--n_workers 2 \
--single_node_with_gpus \
--verbose
Multi-node runs
Multi-node parallelism is often contingent on the cluster setup to be deployed on. Dask provides useful
frameworks to interface various cluster designs. As long as the client
passed to DEHB during
instantiation is of type dask.distributed.Client
, DEHB can interact with this client and
distribute its optimization process in a parallel manner.
For instance, Dask-CLI
can be used to create a dask-scheduler
which can dump its connection
details to a file on a cluster node accessible to all processes. Multiple dask-worker
can then be
created to interface the dask-scheduler
by connecting to the details read from the file dumped. Each
dask-worker can be triggered on any remote machine. Each worker can be configured as required,
including mapping to specific GPU devices.
Some helper scripts can be found here, that can be used as a reference to run DEHB in a multi-node manner on clusters managed by SLURM. (not expected to work off-the-shelf)
To run the PyTorch MNIST example on a multi-node setup using 4 workers:
bash utils/run_dask_setup.sh \
-f dask_dump/scheduler.json \ # This is how the workers will be discovered by DEHB
-e env_name \
-n 4
# Make sure to sleep to allow the workers to setup properly
sleep 5
python examples/03_pytorch_mnist_hpo.py \
--min_fidelity 1 \
--max_fidelity 3 \
--runtime 60 \
--scheduler_file dask_dump/scheduler.json \
--verbose
DEHB Hyperparameters
We recommend the default settings. The default settings were chosen based on ablation studies over a collection of diverse problems and were found to be generally useful across all cases tested. However, the parameters are still available for tuning to a specific problem.
The Hyperband components:
- min_fidelity: Needs to be specified for every DEHB instantiation and is used in determining the fidelity spacing for the problem at hand.
- max_fidelity: Needs to be specified for every DEHB instantiation. Represents the full-fidelity evaluation or the actual black-box setting.
- eta: (default=3) Sets the aggressiveness of Hyperband's aggressive early stopping by retaining 1/eta configurations every round
The DE components:
- strategy: (default=
rand1_bin
) Chooses the mutation and crossover strategies for DE.rand1
represents the mutation strategy whilebin
represents the binomial crossover strategy.
Other mutation strategies include: {rand2
,rand2dir
,best
,best2
,currenttobest1
,randtobest1
}
Other crossover strategies include: {exp
}
Mutation and crossover strategies can be combined with a_
separator, for e.g.:rand2dir_exp
. - mutation_factor: (default=0.5) A fraction within [0, 1] weighing the difference operation in DE
- crossover_prob: (default=0.5) A probability within [0, 1] weighing the traits from a parent or the mutant
To cite the paper or code
@inproceedings{awad-ijcai21,
author = {N. Awad and N. Mallik and F. Hutter},
title = {{DEHB}: Evolutionary Hyberband for Scalable, Robust and Efficient Hyperparameter Optimization},
pages = {2147--2153},
booktitle = {Proceedings of the Thirtieth International Joint Conference on
Artificial Intelligence, {IJCAI-21}},
publisher = {ijcai.org},
editor = {Z. Zhou},
year = {2021}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.