Skip to main content

Run reproducible experiments from yaml configuration file

Project description

Expyrun

License PyPi Python Downloads Codecov Lint and Test

Run fully reproducible experiments from YAML configuration files.

Expyrun is a command-line tool that launches your code from a YAML configuration file and automatically stores everything required to reproduce the run in a dedicated output directory.

It helps you:

  • Centralize experiment configuration
  • Track code and dependency versions
  • Reproduce experiments exactly
  • Organize outputs cleanly

โš ๏ธ Project Status

This library was originally developed to fit my own needs as a researcher.
Its design and implementation are therefore somewhat opinionated and tailored toward research workflows.

Expyrun is currently in beta.

Contributions are very welcome!
Do not hesitate to open an issue if you encounter a bug, have a suggestion, or would like to discuss improvements.


โœจ Features

  • YAML-based configuration
  • Configuration inheritance
  • Environment variable resolution (${MY_VAR})
  • Self-referencing config values (e.g., experiment names based on hyperparameters)
  • Automatic experiment directory creation
  • Frozen requirements.txt snapshot
  • Source code snapshot
  • Automatic stdout/stderr logging
  • Command-line hyperparameter overrides

[!WARNING] Current limitation: lists of objects are not yet supported in the configuration file.


๐Ÿš€ Installation

Install with pip

pip install expyrun

Install from source

git clone https://github.com/raphaelreme/expyrun.git
cd expyrun
pip install .

๐Ÿ Getting Started

Expyrun is a command-line tool. Once installed:

expyrun -h  # Display Expyrun help
expyrun path/to/config.yml  # Run the experiments described by the YAML configuration
expyrun path/to/config.yml --debug  # Run in a debug-specific folder and using the original code without duplication

1๏ธโƒฃ Create an entry point

Your code must expose a function with the following signature:

def entry_point(name: str, config: dict) -> None:
    ...
  • name: the experiment name
  • config: the parsed configuration dictionary

Expyrun will import and execute this function.


2๏ธโƒฃ Minimal configuration file

__run__:
  __main__: package.module:entry_point
  __output_dir__: /path/to/output_dir
  __name__: my_experiment

# Additional configuration passed to your function
# seed: 666
# data: /path/to/data
# device: cuda

__run__ section fields

Key Required Description
__main__ โœ… Entry point in the form package.module:function
__output_dir__ โœ… Base directory where experiments are stored
__name__ โœ… Experiment name (used to build output path)
__code__ โŒ Optional path to the source code

By default, Expyrun searches for your package in the current working directory.
You can override this using __code__.

[!NOTE] As of now, Expyrun only duplicates the package of the __main__ entry point, which is searched inside __code__ folder. Consequently, all of your code should be contained into a single package (which may consist of multiple subpackages)


๐Ÿ“ฆ What Expyrun Generates

For each run, Expyrun creates:

{output_dir}/{name}/exp.{i}/ # If run without --debug (default)
{output_dir}/DEBUG/{name}/exp.{i}  # if run with --debug

Inside:

  • config.yml --- parsed configuration
  • raw_config.yml --- original configuration
  • frozen_requirements.txt --- environment snapshot
  • outputs.log --- stdout/stderr log
  • A copy of your source code package

From inside your entry function, the working directory is automatically set to the experiment folder.
You can safely write outputs (models, logs, metrics, etc.) directly to the current directory.

[!NOTE] Expyrun does not copy external dependencies such as datasets (usually to heavy). You are responsible for keeping data paths valid when reproducing experiments.


๐Ÿงฉ Configuration File Format

Expyrun reserves three special sections in YAML files.

__default__

Inherit configuration from other YAML files.

__default__: path/to/base.yml

Or:

__default__:
  - base.yml
  - other.yml

Paths may be:

  • Absolute: /path/to/file.yml
  • Relative to CWD: path/to/file.yml
  • Relative to the config file: ./path/to/file.yml

This allows you to build modular experiment configurations.


__new_key_policy__

Defines how new keys are handled when inheriting.

Options:

  • "raise" --- Error
  • "warn" --- Warning (Default)
  • "pass" --- Silently accept

A new key is one not defined in any parent configs.

[!NOTE] This does not apply to a base configuration (with no parent).


__run__

Defines how the experiment should be executed.

__run__:
  __main__: package.module:function
  __name__: experiment_name
  __output_dir__: /base/output/path
  __code__: optional/path/to/code

User-defined configuration

Any parameters that your experiment needs to run. For example:

seed: 666

training:
  lr: 0.0001
  epochs: 50

datasets:
  - Cifar10
  - Cifar100
  - ImageNet

๐Ÿงช Concrete Example

[!TIP] See the example/ directory in the repository for a minimal working example.

Project structure

my_project/
โ”œโ”€โ”€ data/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ utils.py
โ”‚   โ”œโ”€โ”€ data.py
|   โ”œโ”€โ”€ methods.py
โ”‚   โ””โ”€โ”€ experiments/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ train.py
โ”‚       โ””โ”€โ”€ eval.py
โ”œโ”€โ”€ configs/
โ”‚   โ”œโ”€โ”€ data.yml
โ”‚   โ”œโ”€โ”€ methods.yml
โ”‚   โ””โ”€โ”€ experiments/
โ”‚       โ”œโ”€โ”€ common.yml
โ”‚       โ”œโ”€โ”€ train.yml
โ”‚       โ””โ”€โ”€ eval.yml

data.yml

data:
  location: $DATA_FOLDER
  train_size: 0.7

methods.yml

ResNet:
  layers: 50
  epochs: 200
  lr: 0.001

ViT:
  epochs: 30
  lr: 0.0005
  patch_size: 16

common.yml

seed: 666
device: cuda

train.yml

__default__:
  - ../data.yml
  - ../methods.yml
  - ./common.yml

__run__:
  __main__: src.experiments.train:main
  __output_dir__: $OUTPUT_DIR
  __name__: training/{seed}  # Name can depend on the seed

eval.yml

__new_key_policy__: pass  # Allow new keys

__default__: ./train.yml  # Inherit from train and therefore from common, data and methods

__run__:
  __main__: src.experiments.eval:main
  __name__: evaluation/{seed}

training_exp: 0  # Id of the training exp to reload
training_folder: $OUTPUT_DIR/training/{seed}/exp.{training_exp}/

โ–ถ Running Experiments

From the root of my_project:

# Set up the required env variables (could be inside ~/.bashrc)
export OUTPUT_DIR=/path/to/output
export DATA_FOLDER=/path/to/data

# Then run expyrun
expyrun configs/experiments/train.yml

With debug mode:

expyrun configs/experiments/train.yml --debug

Override parameters from the CLI:

expyrun configs/experiments/eval.yml --training_exp 3

๐Ÿ“‚ Output Structure Example

After running, you typically get:

$OUTPUT_DIR/
โ”œโ”€โ”€ training/
โ”‚   โ””โ”€โ”€ 666/
โ”‚       โ””โ”€โ”€ exp.0/
โ”‚           โ”œโ”€โ”€ config.yml
โ”‚           โ”œโ”€โ”€ raw_config.yml
โ”‚           โ”œโ”€โ”€ frozen_requirements.txt
โ”‚           โ”œโ”€โ”€ outputs.log
โ”‚           โ”œโ”€โ”€ src/
โ”‚           โ””โ”€โ”€ checkpoints/
โ”‚               โ”œโ”€โ”€ ViT.ckpt
โ”‚               โ””โ”€โ”€ ResNet.ckpt
โ””โ”€โ”€ evaluation/
    โ””โ”€โ”€ 666/
        โ””โ”€โ”€ exp.0/
            โ””โ”€โ”€ ...

๐Ÿ” Reproducing Experiments

Exact reproduction

# Will reproduce this previous experiments into the next available exp.{i} folder
expyrun $OUTPUT_DIR/training/666/exp.0/config.yml

Modify hyperparameters

expyrun $OUTPUT_DIR/training/666/exp.0/raw_config.yml --ResNet.lr 0.005 --seed 111
  • config.yml โ†’ parsed, fixed configuration
  • raw_config.yml โ†’ original config; recommended when modifying parameters: If you change a hyperparameter that affects the experiment name (i.e. seed), the directory will automatically adapt.

Parsing Variables

Expyrun resolves environment variables inside YAML, as well as self references:

data_path: $DATA_FOLDER
dataset: ${DATASET}_raw
seed: 555
output_path: $OUTPUT_FOLDER/{seed}

Environment Variables

Expyrun defines the following variables:

EXPYRUN_CWD

The original working directory from which Expyrun was launched.

This can be useful if your code needs to know where execution started before Expyrun switches to the experiment directory.


๐Ÿ’ก Tips

  • Consider using dataclasses and dacite to convert configuration dictionaries into strongly-typed Python objects.

  • Keep datasets versioned or documented externally for full reproducibility.

  • Use inheritance (__default__) to build clean experiment hierarchies.


๐Ÿ“œ License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expyrun-0.2.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

expyrun-0.2.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file expyrun-0.2.1.tar.gz.

File metadata

  • Download URL: expyrun-0.2.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for expyrun-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4058e3141f20c3f59c897b806a8bfe9ce61aa7d90feeafd8169db37380241fa6
MD5 57fdb1f6b5445b324e4547853ac47256
BLAKE2b-256 38899f9631b7daf751122dadef83d46ff6af681e4f89c05c632b2cf25f085494

See more details on using hashes here.

File details

Details for the file expyrun-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: expyrun-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for expyrun-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c460b919d4a140c249689fdaef324f9347d7b6d850b94778fb68552e592828c5
MD5 e6770839986c493f9867cb98887627f8
BLAKE2b-256 bc5c73451e5116d537d950858b83e9d26e5e4820d4b9f8778960642f932faa58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page