Skip to main content

Machine Learning Experiment Logging

Project description

A Lightweight Logger for ML Experiments ๐Ÿ“–

Pyversions PyPI version Code style: black codecov Colab

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart check out the notebook blog ๐Ÿš€

The API ๐ŸŽฎ

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

You can also log model checkpoints, matplotlib figures and other .pkl compatible objects.

# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)

# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)

Or do everything in a single line...

log.update(time_tic, stats_tic, model, fig, extra, save=True)

File Structure & Re-Loading ๐Ÿ“š

The MLELogger will create a nested directory, which looks as follows:

experiment_dir
โ”œโ”€โ”€ extra: Stores saved .pkl object files
โ”œโ”€โ”€ figures: Stores saved .png figures
โ”œโ”€โ”€ logs: Stores .hdf5 log files (meta, stats, time)
โ”œโ”€โ”€ models: Stores different model checkpoints
    โ”œโ”€โ”€ init: Stores initial checkpoint
    โ”œโ”€โ”€ final: Stores most recent checkpoint
    โ”œโ”€โ”€ every_k: Stores every k-th checkpoint provided in update
    โ”œโ”€โ”€ top_k: Stores portfolio of top-k checkpoints based on performance
โ”œโ”€โ”€ tboards: Stores tensorboards for model checkpointing
โ”œโ”€โ”€ <config_name>.json: Copy of configuration file (if provided)

For visualization and post-processing load the results via

from mle_logging import load_log
log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

If an experiment was aborted, you can reload and continue the previous run via the reload=True option:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch',
                reload=True)

Installation โณ

A PyPI installation is available via:

pip install mle-logging

If you want to get the most recent commit, please install directly from the repository:

pip install git+https://github.com/mle-infrastructure/mle-logging.git@main

Advanced Options ๐Ÿšด

Merging Multiple Logs ๐Ÿ‘ซ

Merging Multiple Random Seeds ๐ŸŒฑ + ๐ŸŒฑ

from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']

Merging Multiple Configurations ๐Ÿ”– + ๐Ÿ”–

from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
                  all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

Plotting of Logs ๐Ÿง‘โ€๐ŸŽจ

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")

Storing Checkpoint Portfolios ๐Ÿ“‚

Logging every k-th checkpoint update โ— โฉ ... โฉ โ—

# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir='every_k_dir/',
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_every_k_ckpt=2)

Logging top-k checkpoints based on metric ๐Ÿ”ฑ

# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="top_k_dir/",
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_top_k_ckpt=3,
                top_k_metric_name="test_loss",
                top_k_minimize_metric=True)

Weights&Biases Backend Integration ๐Ÿง‘โ€๐ŸŽจ

You can also use W&B as a backend for logging. All results are stored as before but additionally we report to the W&B server:

# Provide all configuration details as option
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                use_wandb=True,
                wandb_config={
                  "key": "sadfasd",  # Only needed if not logged in
                  "entity": "roberttlange",  # Only needed if not logged in
                  "project": "some-project-name",
                  "group": "some-group-name"
                })

Citing the MLE-Infrastructure โœ๏ธ

If you use mle-logging in your research, please cite it as follows:

@software{mle_infrastructure2021github,
  author = {Robert Tjarko Lange},
  title = {{MLE-Infrastructure}: A Set of Lightweight Tools for Distributed Machine Learning Experimentation},
  url = {http://github.com/mle-infrastructure},
  year = {2021},
}

Development ๐Ÿ‘ท

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to create an issue and/or start contributing ๐Ÿค—.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mle_logging-0.0.5.tar.gz (30.8 kB view hashes)

Uploaded Source

Built Distribution

mle_logging-0.0.5-py3-none-any.whl (34.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page