Skip to main content

A utility for tracking and reproducing Tensorflow runs.

Project description

Machine learning engineers often run multiple versions of an algorithm concurrently. However, this can make keeping track of and reproducing runs difficult. This simple utility solves this problem by maintaining a database in human-readable YAML formal that tracks

  • A unique name assigned to each run.
  • A description of each run.
  • The exact command used for the run.
  • The date and time of the run.
  • The most recent commit before the run.


The only external prerequisites of this tool are tmux and git. After that, pip install run-manager.


This program tries to assume as little about your program as possible, while providing useful functionality. These assumptions are as follows:

  • Your program lives in a Git repository.
  • The Git working tree is not dirty (if it is, the program will throw an informative error).
  • Your program accepts two flags:
    • --tb-dir: pointing to the same directory that you would specify in tensorboard logdir=<tb-dir> .
    • --save-path: pointing to the directory of the file that you would pass to tf.train.Saver().restore(sess, <save-path>).


For detailed descriptions of each subcommand and its arguments, run

runs <subcommand> -h


Start a new run. This command will automatically create the file structure:


It will add an entry to the database keyed by name, with the following values:

  • command
  • commit
  • datetime
  • description
  • host

Finally, it will execute the command in tmux.

runs new 'run-name' 'python' --description='Description of program'

Note: the --tb-dir and --save-path flags will be automatically appended to this command argument, so do not include them in the <command> argument.


Delete all runs matching pattern. This command also deletes associated tensorboard and checkpoint files.

❯ runs delete "continuous.*"
Delete the following runs?


List all runs matching pattern.

❯ runs list --pattern="continuous.*"


Display entries in run-database in table form.

❯ runs table
name                           command                            commit                             datetime                    description                          host
-----------------------------  ---------------------------------  ---------------------------------  --------------------------  ---------------------------------  ------
continuous2                    CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-03T13:46:48.633364  Run multiple runs to test stoc...    rldl3
continuous3                    CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-03T13:47:09.951233  Run multiple runs to test stoc...    _
continuous1                    CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-03T13:42:39.879031  Run multiple runs to test stoc...    _
house-cnn-no-current-pos       python --timesteps-pe...  9fb9b5a                            2017-10-28T18:07:44.246089  This is the refactored CNN on ...    _
room-with-original-cnn         python --timeste...  8a5e1c2                            2017-10-28T17:09:49.971061  Test original cnn on room.mjcf       _
continuous11509804959          CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-04T10:15:59.373633  Run multiple runs to test stoc...    _
continuous31509805040          CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-04T10:17:20.286275  Run multiple runs to test stoc...    rldl4
room-cnn-no-current-pos        python --timesteps-pe...  2873fbf                            2017-10-28T18:08:10.615461  This is the refactored CNN on ...    rldl4
continuous21509805012          CUDA_VISIBLE_DEVICES=1 python ...  90c0ad704e54d5152d897a4e978cc7...  2017-11-04T10:16:52.129656  Run multiple runs to test stoc...    _

To filter by regex, use --pattern flag.


Lookup specific value associated with database entry.

❯ runs lookup continuous0 commit


Print out commands for reproducing run.

❯ runs reproduce continuous0
To reproduce:
 git checkout da6030dd973c810c330d9635eb8d9c2105bdfe2f
 runs new continuous0 'python --timesteps-per-batch=2048 --continuous-actions --neg-reward --use-cnn' --description='None'

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tf-run-manager, version 1.0.1
Filename, size File type Python version Upload date Hashes
Filename, size tf_run_manager-1.0.1-py2.py3-none-any.whl (6.2 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size tf-run-manager-1.0.1.tar.gz (3.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page