Skip to main content

Run reproducible experiments from yaml configuration file

Project description

Expyrun

Running reproducible experiments from a yaml configuration file.

Expyrun is a command-line script that will launch your code from a yaml configuration file and register in the output directory everything needed to reproduce the run.

The configuration file is a yaml film with some specifications:

  • List of objects are not supported yet.
  • Environement variables are parsed and resolved. ($MY_VAR or ${MY_VAR})
  • The config can reference itself, for instance make the name of the experiment depends on value of some keys. See the examples.

Install

Pip

$ pip install expyrun

Conda

Not yet available

Getting started

Expyrun is a command-line tool. You can directly use it once install:

$ expyrun -h
$ expyrun path/to/my/experiment/configuration.yml

You only have to adapt your code to be executable from an main function expecting a name and a dict configuration for the run. Note that you should probably look at dacite and dataclasses to create nicely typed configuration in your code. But this is out of the scope of expyrun.

Create configuration files

Configuration file format

There are three special sections reserved for expyrun in the yaml files:

  • __default__: Inherit keys and values from one or several other configurations (can be a string or a list of strings). Each path can be absolute (/path/to/default.yml), relative to the current directory (path/to/default.yml) or relative to the current yaml config (./path/to/default.yml). If not set, it is considered empty. This allows you to build common default configuration between your experiences.

  • __new_key_policy__: How to handle new keys in a configuration that inherits from others. Accepted values: "raise", "warn", "pass". Default: "warn". A new key is a key that is present in the current configuration but absent from any of its parents (which is probably weird).

  • __run__: The most import section. It defines the metadata for running your experiment. It has itself 4 differents sections:

    • __main__: Main function to run (Mandatory). Expected signature: Callable[[str, dict], None]. This function will be called with the experiment name and the experiment configuration. A valid main function string is given as package.subpackage.module:function. Expyrun will search the package inside the current working directory.
    • __name__: Name of the experiment. (Mandatory) Used to compute the true output directory, it will be given to the main function.
    • __output_dir__: Base path for outputs to be stored (Mandatory). The outputs will be stored in {output_dir}/{name}/exp.{i} or {output_dir}/DEBUG/{name}/exp.{i} in debug mode. (for the ith experiment of the same name)
    • __code__: Optional path to the code. Expyrun search the code package in the current working directory by default. This allows you to change this behavior.

One example

Let's assume the following architecture

  • my_project/
    • data/
    • my_code/
      • __init__.py
      • utils.py
      • data.py
      • experiments/
        • __init__.py
        • first_method.py
        • second_method.py
    • .git/
    • .gitignore
    • README.md

Different experiments can be launch in experiments package. (One file by experiment). And some code is shared between experiments, for instance the code handling the data.

A simple way to create the configuration files would be to create a new configs directory following roughly the architecture of the code7

  • my_project/
    • configs/
      • data.yml
      • experiments/
        • first_method.yml
        • second_method.yml
# data.yml

data:
    location: $DATA_FOLDER
    train_size: 0.7
# first_method.yml

__default__: ../data.yml

__run__:
    __main__: my_code.experiments.first_method:main
    __output_dir__: $OUTPUT_DIR
    __name__: first_method/{model.name}-{training.lr}

seed: 666

model:
    name: MyModel

training:
    lr: 0.001
    batch_size: 10
# second_method.yml

__default__: ./first_method.yml

__run__:
    __main__: my_code.experiments.second_method:main

seed: 777

model:
    name: MyModelBis

training:
    lr: 0.1
    size: [10, 10]

Then within a terminal in my_project directory, you can launch experiments with

$ expyrun configs/experiments/first_method.yml [--debug]
# Change hyper parameters from arguments:
$ expyrun configs/experiments/second_method.yml --training.size 15,15

Have a look at example folder which implements another simple example.

Build and Deploy

$ pip install build twine
$ python -m build
$ python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expyrun-0.0.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

expyrun-0.0.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file expyrun-0.0.1.tar.gz.

File metadata

  • Download URL: expyrun-0.0.1.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for expyrun-0.0.1.tar.gz
Algorithm Hash digest
SHA256 50d5f52093f5973b651e80a7d69da591197e29a9f895750ef676c73e291b84b6
MD5 99d8e21125ede8e55816d7bf7d2e7d5e
BLAKE2b-256 b0fcd3642774880456dbdc24ba47a8453fb786bcb97bf6a6325547cb3c054016

See more details on using hashes here.

File details

Details for the file expyrun-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: expyrun-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for expyrun-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c331231f732569a715efe0823474ca615fba508df4d127974ba43db4e23fb751
MD5 21e4640a61d37c9c7a2620156ec724e0
BLAKE2b-256 adef5809a26390011a755ba2924b93ae22d9770e8f7f5cfc15e00525bc4706f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page