Skip to main content

Manage data during machine learning projects

Project description

Machine Learning Experiment Manager

Simply manage pytorch models, hparams, output charts, and results CSV files

Container to hold all information about an experiment: tabular data created during training and inference, hyper parameters, charts, tensorboard entries, and (pytorch) model snapshots. These data are stored on disk in standard formats that are accessible by other tools, such as Excel and image viewers. Except for model snapshots, the facility is ML platform agnostic. Only pytorch model instances are currently handled for saving, though all other features remain fully functional.

Selected Details:

  • Persistent dict API for 'non-special' data,
  • Uniform API for special machine learning data types like tables, pyplot figures, and model snapshots.
  • Creates human-readable files under a single directory root
  • Entire experiment archives are movable/copyable with standard OS tools.
  • Files on disk are .csv, for tables, Pandas Series and DataFrame instances, pdf, png, etc. for pyplot figures, and state_dict .pth files for pytorch models.

Starting an Experiment and Saving Data

Examples: an experiment instance is started to retain all data in subdirectories of MyExperiment under the current directory. All archived data will reside under this root.

The primary method is save(), which accepts an arbitrary string key, and data of varying types. The method stores the data in files appropriate for those types:

    from experiment_manager import ExperimentManager, Datatype
    
    experiment_archive = os.path.join(os.path.dirname(__file__), 'MyExperiment')
    exp = ExperimentManager(experiment_archive)

    # Create a tabular entry called 'my_tbl'. An underlying .csv file
    # named my_tbl.csv will be created under experiment_archive:

    probabilities = pd.DataFrame([[.1,.2,.7],[.4,.5,.1]], columns=['foo', 'bar'])
    exp.save('my_tbl', probabilities)

    # Add a row to the same tabular content (i.e. to the same .csv file):
    exp.save('my_tbl, pd.Series([.7,.1,.2], index=['foo', 'bar']))

    # A bit more data still
    df_more = pd.DataFrame([[.3,.3,.4],[.5,.5.,0],[.6,.2,.2]],
                           columns=['foo', 'bar'])
    exp.save('my_tbl', df_more)

    # Save a pytorch model snapshot; the manager will save the state_dict,
    # using torch.save():
    # 
    exp.save('model_snaphot1', model1)

    # Save another snapshot
    exp.save('model_snaphot2', model2)

    # Result chart figures in format indicated by
    # extension of the archive key:

    exp.save('pr_figure.pdf', pr_curve)
    exp.save('cm.png', conf_matrix)

    # A tensorboard location. Note that in this case
    # only the key is provided. This string will be a subdirectory
    # name under the experiment root. As usual for the
    # full path to that directory is returned, and can be
    # used when creating the tensorboard writer:
    
    exp.save('tb_data')
    
    # Saving any other, non-special type of data, as long as
    # json.dump() can handle it: use ExperimentManager instances like
    # a persistent dict. The new dict state is saved soon after being
    # updated

    exp['some_number'] = 10
    exp['some_other_number'] = 20

Client's should call close() on an experiment to close all open writers.

Opening Existing Experiments

To reopen an existing experiment:

    exp = ExperimentManager(experiment_archive)

The data saved in the experiment can be retrieved as appropriate Python data types through the read() method. Callers must provide the data key and the type of data being requested.

   # Obtain a Pandas DataFrame instance:
   my_tbl_df = exp.read('my_tbl', Datatype.tabular)

   # A pyplot Figure instance:
   my_fig    = exp.read('pr_figure.pdf', Datatype.figure)

   # A pytorch model state dict that can then
   # be loaded into a pytorch module:
   my_model_state_dict  = exp.read('model_snaphot1', Datatype.model)

   # A [hyperparameter configuration][#hyperparameters]:
   config = exp.read('hparams')

   # The full tensorboard directory path:
   tb_path = exp.read('tb_data')

Hyperparameters

Hyperparameter values may be stored as dict key/value pairs (exp['lr'] = 0.8). However a class NeuralNetConfig is available for organizing hyperparameters. Given an instance config of this class,

    exp.save('hparams', config)

will create a copy of the configuration under the experiment root.

NeuralNetConfig extends the standard Python configparser package. That is, NeuralNetConfig instances read configuration files of the form

[Paths]

# Root of the data/test files:
root_train_test_data = /Users/paepcke/EclipseWorkspacesNew/birds/src/birdsong/tests/data/birds

[Training]

net_type      = resnet18
batch_size    = 64
lr            = 0.8
pretrained    = True
class_names   = foo,bar,baz

       etc.

where Path and Training are called sections. The NeuralNetConfig class adds the following convenience methods:

   # Obtain specific data types, rather than strings:

   config.getint('Training', 'batch_size')
   config.getfloat('Training', 'lr')
   config.getboolean('Training', 'pretrained')
   config.getarray('Training', 'class_names')
   config.sections()

   config.copy()

   # Equality test:
   config1 == config2
   
   config.to_json()
   config.from_json()

Miscellaneous Methods

   # Obtain full path to any saved data;
   # datatype is an element of the Datatype
   # enumeration:
   exp.abspath(<key>, datatype)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml-experiment-manager-0.1.5.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

ml_experiment_manager-0.1.5-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file ml-experiment-manager-0.1.5.tar.gz.

File metadata

  • Download URL: ml-experiment-manager-0.1.5.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.6

File hashes

Hashes for ml-experiment-manager-0.1.5.tar.gz
Algorithm Hash digest
SHA256 47edb315e1b34370046f3464dd5089f3fbdc61c85758626eceb1db12f2da85d3
MD5 ef4c597c91d574b46ac0d1c1d92a6633
BLAKE2b-256 ff35b9494014d7d832f6b47d8073d2b8d66e2a63a671492eff9911ff985c843e

See more details on using hashes here.

File details

Details for the file ml_experiment_manager-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: ml_experiment_manager-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.6.1 requests/2.25.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.6

File hashes

Hashes for ml_experiment_manager-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cba11207a11be4d7b0f0ad28df96532a65b1dd703e1b2b252b50f3a6fceaca23
MD5 096c728153a1cfd624bd95ae725fc29e
BLAKE2b-256 8bbb3dc73b94068c5ae2acff3b1b093df68812e562ef7fc27238e649ec37254e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page