A workflow for hyperparamter optimization using the ATLAS grid resources

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Hyperparameter Optimization on the Grid

A workflow for running hyperparameter optimization (HPO) for machine learning on the grid.

Basic Workflow
Getting the code
Setup
- Build it yourself
  - Setup the code inside ATLAS environments
- Using a Docker image
  - Setup using Docker image with Singularity
  - Setup using Docker image with Docker
Managing Configuration Files
- Available settings
Running Hyperparameter Optimization Jobs
Monitoring Job Status
Visualizing Hyperparamter Optimization Results
Command line options

Basic Workflow

The execution of the above hyperparameter optimization workflow can be done entirely using the hpogrid tool provided by this repository. The sample usage and instruction for using the hpogrid tool is given in the next few sections.

The workflow can be divided into the following steps:

Step 1: Prepare the configuration files for a hyperparameter optimization task which will submitted to the ATLAS grid site. A total of four configuration files are required. They are:

HPO Configuration: Configurations that define how the hyperparameter optimization is performed. This may include the algorithm of hyperparameter optimization, the scheduling method for choosing the next hyperparameter points, the number of hyperparameter points to be evaluated and so on.
Search Space Configuration: Configurations that define the hyperparameter search space. This includes the sampling method for a particular hyperparameter (such as uniform or normal sampling or logirthmic based uniform sampling) and its range of allowed values.
Model Configuration: Configurations that contains information of the training model that is called by the hyperparameter optimization alogrithm. This should include
- The name of the training script which contains the class/function defining the training model
- The name of the class/function that defines the training model
- The parameters that should be passed to the training model. For details please refer to the section (Adaptation of Training Script)
Grid Configuration: Configurations that define settings for grid job submission. This may include the container inside which the scripts are run, the name of input and output datasets and the name of the grid site where the hyperparameter optimization jobs are run.

Step 2: Upload the input dataset via rucio which will be retrieved by the grid site when the hyperparameter optimization task is executed.

Step 3: Adapt the training script(s) to conform with the format required by the hyperparameter optimization library (Ray Tune).

Step 4: Submit the hyperparamter optimization task and monitor its progress.

Step 5: Retrieve the hyperparamter optimization results after completion. The results can be output into various formats supported by the hpogrid tool for visualization.

Getting the code

To get the code, use the following command:

git clone ssh://git@gitlab.cern.ch:7999/aml/hyperparameter-optimization/alkaid-qt/hpogrid/hpogrid.git

Setup

To setup just use the script (from the root path of the project)

source setupenv.sh

Managing Configuration Files

In general, the command for managing configuraton file takes the form:

hpogrid <config_type> <action> <config_name> [<options>]

The <config_type> argument specifies the type of configuration to be handled. The avaliable types are

hpo_config : Configuration for hyperparamter optimization
grid_config : Configuration for grid job submission
model_config : Configuration for the machine learning model (which the hyperparameters are to be optimized)
search_space : Configuration for the hyperparameter search space

The <action> argument specifies the action to be performed. The available actions are

create : Create a new configuration
recreate : Recreate an existing configuration (the old configuration will be overwritten)
update : Update an existing configuration (the old configuration except those to be updated will be kept)
remove : Remove an existing configuration
list : List the name of existing configurations (the <config_name> argument is omitted)
show : Display the content of an existing configuration

The <config_name> argument specifies name given to a configuration file.

The [<options>] arguments specify the configuration settings for the corresponding configuration type. The available options are explained below.

HPO Configuration

Option	Description	Default	Choices
`algortihm`	Algorithm for hyperparameter optimization	'random'	'hyperopt', 'skopt', 'bohb', 'ax', 'tune', 'random', 'bayesian'
`metric`	Evaluation metric to be optimized	'accuracy'	-
`mode`	Optimization mode (either 'min' or 'max')	'max'	'max', 'min'
`scheduler`	Trial scheduling method for hyperparameter optimization	'asynchyperband'	'asynchyperband', 'bohbhyperband', 'pbt'
`trials`	Number of trials (search points)	100	-
`log_dir`	Logging directory	"./log"	-
`verbose`	Check to enable verbosity	-	-
`stop`	Stopping criteria	'{"training_iteration": 1}'	-
`scheduler_param`	extra parameters for the trial scheduler	'{"max_concurrent": 4}'	-
`algorithm_param`	extra parameters for hyperparameter optimization algorithm	{}	-

Grid Configuration

Option	Description	Default
`site`	Grid site where the jobs are submitted	ANALY_MANC_GPU_TEST
`container`	Docker or singularity container which the jobs are run	/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/aml/hyperparameter-optimization/alkaid-qt/hpogrid:latest
`retry`	Check to enable retrying faild jobs	-
`inDS`	Name of input dataset	-
`outDS`	Name of output dataset	user.${{RUCIO_ACCOUNT}}.hpogrid.{HPO_PROJECT_NAME}.out.$(date +%Y%m%d%H%M%S)

Search Space Configuration

This defines the search space for hyperparameter optimization.

The format for defining a search space in command line is through a json decodable string:

'{"NAME_OF_HYPERPARAMETER":{"method":"SAMPLING_METHOD","dimension":{"DIMENSION":"VALUE"}},
"NAME_OF_HYPERPARAMETER":{"method":"SAMPLING_METHOD","dimension":{"DIMENSION":"VALUE"}}, ...}'

Supported sampling methods for a hyperparameter:

Method	Description	Dimension
`categorical`	Returns one of the values in `categories`, which should be a list. If `grid_search`is set to 1, each value must be sampled once.	`categories`, `grid_search`
`uniform`	Returns a value uniformly between `low` and `high`	`low`, `high`
`uniformint`	Returns an integer value uniformly between `low` and `high`	`low`, `high`
`quniform`	Returns a value like round(uniform(`low`, `high`) / `q`) * `q`	`low`, `high`, `q`
`loguniform`	Returns a value drawn according to exp(uniform(`low`, `high`)) so that the logarithm of the return value is uniformly distributed.	`low`, `high`, `base`
`qloguniform`	Returns a value like round(exp(uniform(`low`, `high`)) / `q`) * `q`	`low`, `high`, `base`, `q`
`normal`	Returns a real value that's normally-distributed with mean `mu` and standard deviation `sigma`.	`mu`, `sigma`
`qnormal`	Returns a value like round(normal(`mu`, `sigma`) / `q`) * `q`	`mu`, `sigma`, `q`
`lognormal`	Returns a value drawn according to exp(normal(`mu`, `sigma`)) so that the logarithm of the return value is normally distributed.	`mu`, `sigma`, `base`
`qlognormal`	Returns a value like round(exp(normal(`mu`, `sigma`)) / `q`) * `q`	`mu`, `sigma`, `base`, `q`

Examples:

hpogrid search_space create my_search_space '{ "lr":{"method":"loguniform","dimension":{"low":1e-5,"high":1e-2, "base":10}},\
"batchsize":{"method":"categorical","dimension":{"categories":[32,64,128,256,512,1024]}},\
"num_layers":{"method":"uniformint","dimension":{"low":3,"high":10}},\
"momentum":{"method":"uniform","dimension":{"low":0.5,"high":1.0}} }'

Model Configuration

This defines the parameters and settings for the machine learning model which the hyperparameters are to be optimized.

Option	Description
`script`	Name of the training script where the function or class that defines the training model will be called to perform the training
`model`	Name of the function or class that defines the training model
`param`	Extra parameters to be passed to the training model

Running Hyperparameter Optimization Jobs

Step 1: Create a custom project with the configuration files:

hpogrid project create PROJECT_NAME [--options]

Option	Action
`scripts_path`	the path to where the training scripts (or the directory containing the training scripts) are located
`hpo_config`	the hpo configuration to use for this project
`grid_config`	the grid configuration to use for this project
`model_config`	the model configuration to use for this project
`search_space`	the search space configuration to use for this project

Step 2: Run the project:

To run locally:

hpogrid local_run PROJECT_NAME

To run on the grid:

hpogrid run PROJECT_NAME [--options]

Option	Action	Default
`n_jobs`	the number of grid jobs to be submitted (useful for random search, i.e. to run a single search point per job)	1
`site`	the site to where the jobs are submitted (this will override the site setting in the grid configuration	-

Monitoring Job Status

Visualizing Hyperparamter Optimization Results

Command Line Options

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.3.3

Jan 16, 2023

1.3.2

Nov 18, 2022

1.3.1

Jul 21, 2022

1.3.0

Apr 19, 2022

1.2.9

Apr 9, 2022

1.2.8.1

Mar 26, 2022

1.2.8

Mar 26, 2022

1.2.7

Mar 25, 2022

1.2.6

Mar 25, 2022

1.2.5

Mar 22, 2022

1.2.4

Mar 21, 2022

1.2.3

Mar 21, 2022

1.2.2

Mar 20, 2022

1.2.1

Mar 20, 2022

1.2.0

Mar 20, 2022

1.1.4

Jun 29, 2021

1.1.3

Jun 26, 2021

1.1.2 yanked

Jun 26, 2021

Reason this release was yanked:

buggy

1.1.1 yanked

Jun 26, 2021

Reason this release was yanked:

buggy

1.1.0 yanked

Jun 26, 2021

Reason this release was yanked:

buggy

1.0.9

Mar 3, 2021

1.0.8

Mar 3, 2021

1.0.7

Feb 22, 2021

1.0.6

Feb 4, 2021

1.0.5

Jan 29, 2021

1.0.4

Jan 29, 2021

1.0.3

Jan 29, 2021

1.0.2

Jan 29, 2021

1.0.1

Jan 29, 2021

1.0.0

Jan 29, 2021

0.8.4.2

Jan 27, 2021

0.8.4.1

Jan 26, 2021

0.8.4.0

Jan 26, 2021

0.8.3.9

Jan 26, 2021

0.8.3.8

Jan 26, 2021

0.8.3.7

Jan 22, 2021

0.8.3.6

Jan 22, 2021

0.8.3.4

Jan 22, 2021

0.8.3.3

Jan 22, 2021

0.8.3.2

Jan 22, 2021

0.8.3.1

Jan 22, 2021

0.8.3.0

Dec 29, 2020

0.8.2.9

Dec 29, 2020

0.8.2.8

Dec 29, 2020

0.8.2.7

Dec 29, 2020

0.8.2.6

Dec 29, 2020

0.8.2.4

Nov 19, 2020

0.8.2.3

Nov 18, 2020

0.8.2.2

Nov 17, 2020

0.8.2.1

Oct 31, 2020

0.8.2

Oct 29, 2020

0.8.0

Oct 19, 2020

0.7.9

Oct 12, 2020

0.7.8

Oct 11, 2020

0.7.7

Oct 8, 2020

0.7.6

Sep 24, 2020

0.7.4

Sep 20, 2020

0.7.3

Sep 17, 2020

0.7.2

Sep 16, 2020

0.7.1

Sep 16, 2020

0.7.0

Sep 16, 2020

0.6.9

Sep 16, 2020

0.6.8

Sep 16, 2020

0.6.7

Sep 15, 2020

0.6.6

Sep 14, 2020

0.6.5

Sep 11, 2020

0.6.4

Sep 9, 2020

0.6.3

Sep 8, 2020

0.6.2

Sep 8, 2020

0.6.1

Sep 8, 2020

0.6.0

Sep 8, 2020

0.5.0.1

Sep 5, 2020

0.5.0

Sep 5, 2020

0.4.9

Sep 4, 2020

0.4.8

Sep 4, 2020

0.4.7

Sep 3, 2020

0.4.6

Sep 3, 2020

0.4.5

Sep 3, 2020

0.4.4

Sep 3, 2020

0.4.3

Sep 2, 2020

0.4.2

Sep 2, 2020

0.4.0.1

Jul 23, 2020

0.4.0

Sep 2, 2020

0.3.8

Jul 20, 2020

0.3.4

Jul 14, 2020

0.3.3

Jul 13, 2020

0.3.0

Jul 11, 2020

0.2.9

Jul 7, 2020

0.2.8

Jul 6, 2020

0.2.7

Jul 6, 2020

0.2.6

Jul 5, 2020

0.2.5

Jul 5, 2020

0.2.4

Jul 5, 2020

0.2.3

Jul 5, 2020

0.2.2

Jul 4, 2020

0.2.1

Jul 4, 2020

0.2.0

Jul 3, 2020

0.1.9

Jun 27, 2020

0.1.8

Jun 9, 2020

0.1.7

Jun 9, 2020

0.1.6

Jun 8, 2020

0.1.4

Jun 8, 2020

0.1.2

Jun 6, 2020

0.1.1

Jun 6, 2020

0.1.0

Jun 6, 2020

0.0.5

Jun 3, 2020

0.0.4

Jun 2, 2020

0.0.3

Jun 2, 2020

This version

0.0.2

Jun 1, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hpogrid-0.0.2.tar.gz (30.2 kB view hashes)

Uploaded Jun 1, 2020 Source

Built Distribution

hpogrid-0.0.2-py3-none-any.whl (38.8 kB view hashes)

Uploaded Jun 1, 2020 Python 3

Hashes for hpogrid-0.0.2.tar.gz

Hashes for hpogrid-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`7d493eb7b45102f8d6af56e7d6c0b243f50d49fafd3fd032aa1012c58a2612d9`
MD5	`8472a7939eeb61639e310dee8be96e9f`
BLAKE2b-256	`2a9b5d9f6872c4f7702e962941746cdc942063d7551150d66c9863cd01dde2a8`

Hashes for hpogrid-0.0.2-py3-none-any.whl

Hashes for hpogrid-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03736cac46a786e664c2ecff692743e5497747c37bb5b6770e18e580e850ef2f`
MD5	`57e400d9648c0673158750d148418779`
BLAKE2b-256	`3731f1ae16447f3ccd7c59d51a1c73ba20362d3758333c87ab8a0d29f269575e`