Skip to main content

A thoughtless and all-round experiment manager for Python

Project description

ExperimentManager

Thoughtless and all-round experiment manager for Python. Its main purpose is to automate all background work relating to saving files and logging. ExperimentManager can easily be used to :

  • Manage saving and loading directories for all your experiment files with automatic verisonning and logging.
  • Record the output of your experiment and save all source files for more reproductability.
  • Save (almost) any object without having to worry about anything.
  • Log all your metrics thoughtlessly.
  • Manage, edit and inject configuration dictionnaries directly into specified functions
  • Encapsulate individual runs within your experiment. Any function can be used to create a Run; this will automatically generate a new run directory in saved_runs with dedicated saved_files and saved_metrics directories as well as a specific log file.

Disclaimer

This is a beta version. It is stable within my testing environment (Ubuntu with specific library verisons) but has not been tested on others. A lot of features are not yet documented!

For those familiar with the Sacred library, this project stems from their experiment manager but ambitions to correct some behavior problems (mentionned in their Issues) while allowing for better and more flexible feature and doing everything locally (without relying on other database managers or UIs which can be unflexible or tricky to install in specific environments). The only part of their code that is used here is the function signature capture.

Installation

To install, simply use pip install . within the git direcotry.

This beta version runs with Tensorflow 1.13 and Keras 2.2.4 (used for Tensorboard and model Saving features). Earlier versions are not tested.

Usage

To create an Experiment, you only need to add the following lines in your main file:

from ExperimentManager import createExperiment
manager = createExperiment('my first experiment')

There are many parameters that can be specified when creating an Experiment, the main ones are:

  • name : name for the experiment. Will be used for creating the save directory, if needed.
  • experiments_dir : the optionnal parent dir in which you want the manager to save and log.
  • project_dir : the parent directory of the code that is used to run the experiments. This is mostly used to backup the source code for more reproductability. If not provided, will use the parent dir of the file that called the init.
  • load_dir : directory used for easier imports, it will be prefixed on all paths generated using manager.get_load_path
  • verbose : 0,1 or 2. 1 will add some internal logs in experiment_info.log while 2 will log details on every internal function call in debug.log (only use this to test the behavior of this class, it slows the process down by a lot!)
  • tensorboard : True or False, log to tensorboard events when using metric logging methods
  • ghost : True or False (default is False). When True, this will disable all saving and logging features, not a single directory or file will be created. This is usefull when running tests.

ExperimentManagers can also be created from a configuration file using manager = ExperimentManager.getManagerFromConfig('config.json'). A sample configuration file can be found in the demo directory.

Managing directories

Creating an Experiment Manager will automatically generate the following directory structure:

./
-- (your_source_code)
-- managed_experiments/
---- my first experiment DATETIME/
-------  saved_files/
-------  saved_sources/
-------  saved_metrics/
-------  saved_runs/
-------  experiment_info.log
-------  debug.log
-------  stdout_capture.log

Notice that the manager_experiments direcotry was created in the same directory as the source code. More precisely, if no experiment_dir is specified at initialisation time, the directory will be that containing the code which called ExperimentManager.createManager.

All created directories have rather straight-forward goal. Saved runs contains information about runs that are conducted during the experiment using for instance the manager.run command. The pros of using this method are explained in the Runs section below.

The path to every directory is an attribute of the manager so you can access it anytime. In particular, the "my first experiment DATETIME" directory can be accessed with manager.experiment_dir.

To access your experiment manager in any file, simply use the getExperiment method of ExperimentManager. It will recover the active experiment manager : ExperimentManager.getExperiment().

Reproductability and Control

By default, your experiment manager will also save a copy of all the projects source files (.py) in saved_sources and capture the stdout in a dedicated logger ('stoud_capture.log'). This is usually enough to get a firm grasp of any past experiment and reproduce its results.

Saving, like never before

At any point, you can simply call manager.save(object,name) and ExperimentManager will :

  • Detect which folder it should save in : the general saved_files directory in my frist experiment DATETIME or in a specific run subdirectory.
  • Detect the saving method that should be used.
  • Version the name if needed.

Implemented saving methods cover the following data types:

  • Lists and dicts
  • Numpy arrays
  • Tensorflow tensors (if they can be evaluated)
  • Keras models
  • Matplotlib figures

New saving methods can easily be added, refer to the add_saver methods doc.

Logging metrics

Experiment Manager supports CSV loggin of scalars and Tensorboard logging of scalars and histograms.

Just use manager.log_scalar(metric_name,value) and ExperimentManager will again detect the current run and save the metrics, versioning the name if necessary. An optional step argument can be added after value, by default, steps will be auto-incremented 0-based integers. Unless manager.tensorboard is set to false, logging scalars will log to a csv in the runs or global metrics directory and also log to the tensorboard directory. Metric_name is used to name the csv file as well as the header for the value column in the CSV.

For logging several metrics in a single CSV, use manager.log_scalars(file_name,values,header,step=None).

Logging a historgram is done exactly the same way manager.log_histrogram(name, values, step, bins=1000) (remember that histograms are only logged to tensorboard, not as CSV which would be too heavy; hence if tensorboard support is disabled, this will do nothing).

Configurations

Configurations are a great way to manage all your experiment parameters in one place (a json file or dictionnary). ExperimentManager has a dedicated configuration dictionnary for every run in addition to a global configuration dictionnary. For hand-designated functions, your manager will inject it's configuration parameters by changing the function's signature (overwriting default values).

To add a configuration dictionnary, just use manager.add_config(dictionnary) (json file support coming soon). The current run or the general configuration dictionnary will be updated using the input dictionnary. If a run configuraiton dictionnary has fields in common with the general dictionnary, the run's options will always prevail (within that run only of course!).

To designate a function that should receive configuration values, add the @manager.capture decorator. You can specify exactly which fields should be injected using the prefixes parameter. Here's an example:

manager = createManager('test')

config1 = { 
		"name" : "Victor",
		"status" : 1,
		"details" : {
			"age" : 22,
			"towns" : {
				0 : "Brussels",
				1 : "Paris"
				}
			}	
		}

manager.add_config(config1) #this will update the general configurations dict since we are not in a run

@manager.capture()
def function1(time,name,details):
    pass

@manager.capture()
def function2(name,status,details):
    pass

@manager.capture(prefixes=['name','details.towns','details.age,'details'])
def function2(names,towns,details):
    pass

Is equivalent to:

def function1(time,name='Victor',details={"age" : 22,"towns" : {0 : "Brussels",1 : "Paris"}}):
    pass

def function2(name='Victor',status=1,details={"age" : 22,"towns" : {0 : "Brussels",1 : "Paris"}}):
    pass

def funciton3(name='Victor',towns={0 : "Brussels",1 : "Paris"},age=22,details={"age" : 22,"towns" : {0 : "Brussels",1 : "Paris"}}):
    pass

As you can see, not all function parameters need to be covered. If you add **kwargs, all the keys will be added.

Runs

To take full advantage of the ExperimentManager, run your tasks using the run method of your manager. ExperimentManager will automatically detect which run is active when using saving and logging methods so that your files will always end up in the right place. To launch a run you have two options:

  • Add a @manager.command decorator to a function and run it using manager.run(the_name_of_the_function).
  • Use manager.add_command(function) and the same running method. This has the benefit of not needing to modify any part of your code (by adding @manager.command) but comes at the cost of losing configuration injections.

Note that the command decorator also calls the capture decorator (and thus performs configuration injections).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ExperimentManager-0.0.3-py3-none-any.whl (30.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page