Package for declarative hyperparameter search experiments.

Project description

Declair :cake:

Note: This project is under heavy development. The configuration API may change at any moment.

Declair is a framework for declaratively defining hyperparameter optimization experiments. It uses Sacred for storing experiment results and supports hyperopt for optimization.

It came about from attempts to recreate DeepSolaris results in PyTorch instead of Keras. However, it grew to be a more extensive and general framework than originally planned.

Getting started

Installation

Install required Python packages in your favourite virtual environment

pip install -r requirements.txt

Running the tests

Go into the root of the repository (i.e. where this README.md is) and run

python -m pytest

Running examples

There is an example project with experiment configs in examples.

The default configuration for experiments assumes a MongoDB database on localhost, port 27017 with a user mongo_user and password mongo_password. This is the default configuration from a Docker example for Sacred. There is an alternative configuration available without MongoDB, however its usefulness is limited as it only saves outputs to a directory without being attached to a dashboard.

You can run the experiments with the above MongoDB configuration by

python -m examples.execute_sample <config_example_name>

or without the MongoDB configuration as

python -m examples.execute_sample <config_example_name> declair_env_without_mongodb.yaml

where <config_example_name> is a filename of one of the .json configurations from examples/sample_configs.

Usage

At the core, this package allows to reproducibly define and execute model hyperparameter search as well as single runs of parametrized experiments, all while keeping the results of each execution safely stored using Sacred. The experiments can be arbitrary insofar as they can be narrowed down into a single Python function that, given some parameters, runs them.

This package will gladly examine possible parameter combinations to an arbitrary level of parameter nestedness, where parameters can also be Python functions or classes in modules. In addition, by using Hyperopt, it is also possible to define continuous numeric parameters where necessary.

There are two distinct modes of experiments:

Runs, which are a single execution of the experiment
Search, which define a parameter search space and generate multiple runs in order to execute them

Experiments are defined in .json files. These .json files can contain special object definitions, listed in Special JSON objects section.

Run definition

Basic structure of runs:

{
    "mode": "run",
    "execute_function": {"__type__": "some.module.function"},
    "experiment_name": "name for reference in Sacred",
    "params": {
        "some_arg": "some value",
        "some_function_arg": {"__type__": "some.module.another_function"},
        [...]
    }
}

Note that all entries of form {"__type__": "some.module.function"} will be loaded into memory as types as if they were imported like import some.module.function. Also, it's possible to insert variables taken from the environment configuration, but more on that later.

Upon execution, some.module.function will be executed as a captured function in a Sacred experiment with params as the configuration. In other words, params will be provided as a first argument dictionary, and the function should also accept _run in order to store results in Sacred. The function some.module.function is responsible for tracking results in Sacred correctly. There are some helper functions for doing so, but only for specific frameworks (i.e. PyTorch Ignite). Otherwise, see Sacred docs on how to use the Run object to store results. Methods add_artifact and log_scalar are specifically of interest, as they allow to store output files and to keep track of metrics over the course of the run.

Furthermore, if you'd like to use Hyperopt for parameter search be sure to return in the function an output dictionary with an entry which is the value to optimize.

For an example run, see examples/sample_configs/run.json with the associated execution function in examples/sample_project/execute.py

Search definition

Basic structure of grid and hyperopt search experiments is similar to run experiments, with an additional trick: in the params entry, any list defines multiple disjoint possible configurations, with each item of the list defining another possible configuration.

As a very simple example, if a dictionary like

{'optimizer': [
    {
        'function': 'Adagrad',
        'lr': [4e-05, 4e-06]
    },
    {
        'function': 'RMSprop',
        'lr': [1e-2, 1e-4],
        'momentum': [0, 1e-2]
    }
]}

finds its way into the search definition, it will be unrolled into such possible values:

[{'optimizer': {'function': 'Adagrad', 'lr': 4e-05}},
 {'optimizer': {'function': 'Adagrad', 'lr': 4e-06}},
 {'optimizer': {'function': 'RMSprop', 'lr': 0.01, 'momentum': 0}},
 {'optimizer': {'function': 'RMSprop', 'lr': 0.01, 'momentum': 0.01}},
 {'optimizer': {'function': 'RMSprop', 'lr': 0.0001, 'momentum': 0}},
 {'optimizer': {'function': 'RMSprop', 'lr': 0.0001, 'momentum': 0.01}}]

This works for any arbitrary level of nestedness of dictionaries and lists. However, note that lists nested directly inside lists are ambiguous. So, in case you wish to use an actual sequence for a parameter, you can do so by in place of [...] using {"__tuple__": [...]}. Entries of the sequence will also support unrolling. For example, a list of options

['cat', {'__tuple__': [[0, 1],2,3]}]

is unrolled into

['cat', (0, 2, 3), (1, 2, 3)]

Furthermore, Hyperopt search space definitions also support special entries of form {"__hp__": "<hyperopt-param>", "args": (...)} or {"__hp__": "<hyperopt-param>", "kwargs": {...}} where <hyperopt-param> is one of hyperopt parameter expressions as in here (but without hp.) with the corresponding arguments or keyword arguments. The data type (float or int) of the parameter will be inferred, however if you need to be sure you can add "dtype": "int" or "dtype": "float" to the entry.

So, with all that in mind, the basic structure of a search experiment is:

{
    "mode": "search",
    "type": <"grid" or "hyperopt">,
    "execute_function": {"__type__": "some.module.function"},
    "experiment_name": "name for reference in Sacred",
    "search_params": {
        ...
    },
    "params": {
        "some_arg": ["some value", "some other value"],
        "some_function_arg": {"__type__": "some.module.another_function"},
        ...
    },
    "static_params": {
        "some_fixed_list_parameter": [
            "necessary entry",
            "another necessary entry",
            ...
        ]
    }
}

Here the two new elements compared to a Run definition are search_params, which are rather self-explanatory, and static_params. static_params are inserted into params, but without expanding lists into disjoint possibilities. As such, all lists inside static_params are equivalent to the {"__tuple__": [...]} form of sequences in params, without any difference otherwise. It's just a little ergonomic helper feature.

For grid search experiments, search_params are:

"search_params": {
    "runs_per_configuration": <number of times each configuration is executed>
}

For a sample gridsearch, see examples/sample_configs/search_grid.json.

For Hyperopt search experiments, search_params are:

"search_params": {
    "fmin_kwargs": {
        <hyperopt.fmin keyword arguments>
    },
    "optimize": {
        "type": <"max" or "min">,
        "target": "target variable"
    }
}

fmin_kwargs defines keyword arguments to hyperopt.fmin

optimize defines which variable we wish to optimize. target defines which value of some.module.function output dictonaries to optimize for. type defines whether it's a max or min optimization problem.

For a sample Hyperopt search, see examples/sample_configs/search_hyperopt.json.

Environment configuration

It is possible to insert more information into runs from the environment, such as data directory paths. Environment configuration is stored in YAML files.

For an experiment definition path/definition.json, the environment configuration files are executed in this order (each lower file can overwrite settings of the files before it):

declair_env.yaml in the root of the git repository of the file path/definition.json, if it's in a git repository.
path/declair_env.yaml
path/definition.json.declair_env.yaml

To insert an entry into the experiment definition, write {"__env__": [... key ...]} where the key is a list of nested entries to follow.

For example, if you have a YAML

dataset:
    path: "/storage/data/sample_dataset"

and wish to use path for an experiment, in the experiment definition write {"__env__": ["dataset", "path"]} in place where you wish to use it. Also, you can add an entry default: <some value> to this dictionary to define a default value in the event this information is not contained in the environment YAML.

Configuring where to store results

Furthermore, the environment config YAML files are used to define where to store experiment results. Sacred observers are used to store results. As of now, only Mongo Observer and File Storage Observer are working.

Observers are defined in the YAML as follows:

observers:
    file:
        path: "out/test_run"
    mongo:
        url: "mongodb://mongo_user:mongo_password@localhost:27017"

Here file and path define the directory to store File Storage Observer outputs. mongo and url define the url to a database for the Mongo Observer. If either is not provided, that observer is not used. If an empty config is given, no observers are used and thus no results are stored.

Executing experiments

To execute an experiment from a definition file, use

from declair import execute_file
execute_file(file_path)

If you'd like to do a dry-run, without using any Sacred observers (or any other environment configuration), use

from declair import execute_file, Environment
env = Environment() # empty environment config
execute_file(file_path, env)

Special JSON objects

Type dictionary

{"__type__": <type string>}

imported into memory as if import <type string> was used in code.

Call dictionary

{
    "__call__": <function or type string>,
    "__with_args__": <args list> (optional),
    "__with_kwargs__": <kwargs dict> (optional)
}

called as if the function under __call__ was called with __with_args__ as positional arguments and __with_kwargs__ as keyword arguments.

Environment dictionary

{"__env__": <key list>}

taken from the environment, such that each consecutive entry in the key list is taken from a progressively deeper nested configuration.

Tuple dictionaries

{"__tuple__": <sequence>}

are turned into a tuple during search, without being split up into disjoint possibilities.

Hyperopt parameter dictionaries

{"__hp__": <hyperopt-param>, "args": [...]}

{"__hp__": <hyperopt-param>, "kwargs": {...}}

where <hyperopt-param> is one of hyperopt parameter expressions as in here (but without hp.). It is used in the hyperopt search space definition as a parameter variable.

Credits

This project has been heavily inspired by cbds_common.

Project details

Release history Release notifications | RSS feed

0.1.5

Jul 13, 2021

0.1.4

Feb 23, 2021

0.1.3

Feb 16, 2021

0.1.2

Feb 16, 2021

This version

0.1.1

Feb 16, 2021

0.1.0

Feb 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

declair-0.1.1.tar.gz (35.2 kB view hashes)

Uploaded Feb 16, 2021 Source

Built Distribution

declair-0.1.1-py3-none-any.whl (45.2 kB view hashes)

Uploaded Feb 16, 2021 Python 3

Hashes for declair-0.1.1.tar.gz

Hashes for declair-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`99940b3ba887b1a2c5a7f36cdbabafc424d81a7933779f93311bc7a684b54120`
MD5	`dfe70fb14bdebf9692c8b441402daac8`
BLAKE2b-256	`897c328493fef9b7e0699e3edfce58e439a4a2d3f7d961f7e7a861646ba02a8d`

Hashes for declair-0.1.1-py3-none-any.whl

Hashes for declair-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a162ce7ec343d4b09c1d195dab14e2c4998fee2090136f88049078fb70a592ea`
MD5	`ea1596336be00fd9f8af6981b943af44`
BLAKE2b-256	`06c1093f5f1d0a5e0a42ef09c5599148cd39602f74424cbb5946428857542a15`