Simple and minimalistic utility to manage many experiments runs and custom analysis of results

## Project description

Simple and minimalistic utility to manage many experiments runs and custom analysis of results

## Why another custom solution?

My job is to do research in Deep Learning and I have dozens of different experiments. Testing one hypothesis usually required several runs over parameter grid. Plotting and visualizing results is often ad-hoc and updating code producing output is a kind of overhead. Instead I decided to collect all results in Jupyter notebook and create plots kind of interest ~ parameters. As I said, plotting that is a separate task almost every time. Such tools as ModelDB provide you with simple visualizations so that they can be easily aggregated for model comparison. Testing a hypothesis is not about model comparison and thus requires special treatment.

Visualizing results became a kind of pain, you had to remember a mapping parameters -> results, separating results into different folders made even more mess. I had really bad experience in visualizations. I got that all I need was to iterate over folder with results and apply the same function to it.

## Installation

pip install -U git+https://github.com/ferrine/exman.git#egg=exman
# or
pip install exman

## Simple Start

Simple drop in replacement of standard argparse.ArgumentParser

#file: main.py
import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__))  # root = ./exman relative to the main file
parser.add_argument(...)

You then just add arguments as you did before without any change.

## Best Practices

### Error Handling in main

Since 0.0.3 you can use the following context manager. If main() function fails it will be moved to exman/fails

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__))  # root = ./exman relative to the main file
...
if __name__ == '__main__':
args = parser.parse_args()
with args.safe_experiment:
main(args)

To avoid non reproducible results you can ensure you have commited all changes. Exman will take care and will log hash for the commit and diff if any. To use these features you should hint the parser with the repo.

import exman

parser = exman.ExParser(root=exman.simpleroot(__file__), git=True)
# less fragile solution, but works only locally
parser = exman.ExParser(root=exman.simpleroot(__file__), git="/abs/path/to/repo")
# an ok solution, if you are sure in the relative path
parser = exman.ExParser(root=exman.simpleroot(__file__),
git=os.path.join(os.path.dirname(__file__), "relative", "path", "goes", "here"),
git_assert_clean=True  # run assertion check before each run. False by default.
)

In cli of your favorite experiment you can skip the assertion if you want to:

python train.py --git-dirty --other-args

### Optional Parameters

To avoid issues in reproducing experiments you should consider using exman.optional(type) for optional arguments

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__))  # root = ./exman relative to the main file
parser.add_argument('--myarg', type=exman.optional(int))

### Validators

In simple argparser you cant easily validate multiple arguments, it is easy in Exman. You can create an informative error message

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(root=exman.simpleroot(__file__))  # root = ./exman relative to the main file
# here p stands for initial namespace parsed from arguments
parser.register_validator(lambda p: p.arg1 != p.arg2 or p.arg3 == p.arg4,
# next line will be autoformatted for you using .format
'You have provided wrong set of arguments: {arg1}, {arg2}, {arg3}, {arg4}')

Advanced validators can raise exman.ArgumentError that contains a better message than the one in validators function

### Marry Pandas with Exman

Pandas is a great tool to work with table data. Experiments are the same data and can be loaded in python. So all you need is to run batch of experiments and open a Jupyter notebook.

import exman
index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info()

Table has columns time (datetime64[ns]) of experiment and root (pathlib.Path) path to results. Moreover this table has all other parameters of the experiment. You later can filter/order the results according to them and have easy-breezy access to results folder and it’s content.

for i, ex in experiments.iterrows():
# do some actions
# use ex.param for parameters
# ex.root / 'plot.png' for file paths
...

### Local Configuration

You can store local configuration files in your experiment folder. You should provide the filename to ExParser as well.

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
default_config_files=['local.cfg']
)

Local configuration stores globally defined default values, they override defaults set in main file

### Auto Structure

If you want argument specific human friendly directory structure you can tie specific argument names for that

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
automark=['arg1', 'constant']
)
parser.add_argument('--arg1')

Later you can see your marked folder looks like this

exman/marked/arg1/<arg1>/constant/<name-of-experiment>/...

This can be usefull if you work in a team. Write in main.py

import exman
# you should always use exman.simpleroot(__file__) unless you want another dir
parser = exman.ExParser(
root=exman.simpleroot(__file__),
automark=['user'],
# store user: myuser content in local.cfg
default_config_files=['local.cfg']
)
parser.add_argument('--user')

After you’ve done that, your team runs can be stored in a single exman directory assuming all access rights are correctly set up.

exman/marked/user/<username>/constant/<name-of-experiment>/...

## Directory Structure and CLI

In command line runs will look also the same:

python main.py --param1 foo --param2 bar

Things change if you actually run the program. It dumps all the parsed parameters combined with defaults into Yaml style file into location root/runs/<name-of-experiment>/params.yaml. name-of-experiment is generic and autocreated on the fly. For quick look or search there are symlinks in the index folder e.g. root/index/<name-of-experiment>.yaml. Since a lot of experiments are created and debugging is sometimes needed, you might want not to create debug experiments in runs folder. For that case you just add --tmp flag and new filed will be written to root/tmp/<name-of-experiment> folder. That is convenient as you both do not loose important info about experiment and results and can restore these symlinks in index by hand if needed.

root
|-- runs
|   -- xxxxxx-YYYY-mm-dd-HH-MM-SS
|       |-- params.yaml
|       -- ...
|-- fails
|-- index
|-- marked
|   -- <mark>
|           |-- params.yaml
|           -- ...
-- tmp
-- xxxxxx-YYYY-mm-dd-HH-MM-SS
|-- params.yaml
-- ...

### Rerunning experiment

If you want to reproduce an experiment, you can provide source configuration file in yaml format. For example:

python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml

All the values will be restored from the previous run. You can also modify old values in --config ... using

python main.py --config root/index/<name-of-experiment-to-reproduce>.yaml --override-param=new_value

In case you do not want to restore some argument from saved config (it may be some dynamic setted variable) you should use volatile=True in add_argument:

parser.add_argument('--my_dynamic_id', default=os.environ.get('AUTOSETTED_ID'), volatile=True)

## Marking experiments

If you like some experiments you can mark them for easier later access.

cd root_of_exman_dir
exman mark <key> <#ex1> [<#ex2> <#ex3> ...]

and later in Jupyter

index = exman.Index(exman.simpleroot('/path/to/main.py'))
experiments = index.info('<key>')
# assuming you work in a team and use best practice advice
user_experiments = index.info('user/username')

## Deleting experiments

cd root_of_exman_dir
# delete only index
exman delete <#ex1> [<#ex2> <#ex3> ...]
# delete all files
exman delete --all <#ex1> [<#ex2> <#ex3> ...]

## Project details

Uploaded source
Uploaded py3`