Skip to main content

Machine Learning Experiment Logging

Project description

A Lightweight Logger for ML Experiments ๐Ÿ“–

Pyversions PyPI version Code style: black codecov Colab

Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart check out the notebook blog ๐Ÿš€

The API ๐ŸŽฎ

from mle_logging import MLELogger

# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch')

time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}

# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()

You can also log model checkpoints, matplotlib figures and other .pkl compatible objects.

# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)

# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)

# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)

Or do everything in a single line...

log.update(time_tic, stats_tic, model, fig, extra, save=True)

File Structure & Re-Loading ๐Ÿ“š

The MLELogger will create a nested directory, which looks as follows:

experiment_dir
โ”œโ”€โ”€ extra: Stores saved .pkl object files
โ”œโ”€โ”€ figures: Stores saved .png figures
โ”œโ”€โ”€ logs: Stores .hdf5 log files (meta, stats, time)
โ”œโ”€โ”€ models: Stores different model checkpoints
    โ”œโ”€โ”€ init: Stores initial checkpoint
    โ”œโ”€โ”€ final: Stores most recent checkpoint
    โ”œโ”€โ”€ every_k: Stores every k-th checkpoint provided in update
    โ”œโ”€โ”€ top_k: Stores portfolio of top-k checkpoints based on performance
โ”œโ”€โ”€ tboards: Stores tensorboards for model checkpointing
โ”œโ”€โ”€ <config_name>.json: Copy of configuration file (if provided)

For visualization and post-processing load the results via

from mle_logging import load_log
log_out = load_log("experiment_dir/")

# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])

If an experiment was aborted, you can reload and continue the previous run via the reload=True option:

log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="experiment_dir/",
                model_type='torch',
                reload=True)

Installation โณ

A PyPI installation is available via:

pip install mle-logging

If you want to get the most recent commit, please install directly from the repository:

pip install git+https://github.com/mle-infrastructure/mle-logging.git@main

Advanced Options ๐Ÿšด

Merging Multiple Logs ๐Ÿ‘ซ

Merging Multiple Random Seeds ๐ŸŒฑ + ๐ŸŒฑ

from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']

Merging Multiple Configurations ๐Ÿ”– + ๐Ÿ”–

from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
                  all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))

Plotting of Logs ๐Ÿง‘โ€๐ŸŽจ

meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")

Storing Checkpoint Portfolios ๐Ÿ“‚

Logging every k-th checkpoint update โ— โฉ ... โฉ โ—

# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir='every_k_dir/',
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_every_k_ckpt=2)

Logging top-k checkpoints based on metric ๐Ÿ”ฑ

# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                experiment_dir="top_k_dir/",
                model_type='torch',
                ckpt_time_to_track='num_updates',
                save_top_k_ckpt=3,
                top_k_metric_name="test_loss",
                top_k_minimize_metric=True)

Weights&Biases Backend Integration ๐Ÿง‘โ€๐ŸŽจ

You can also use W&B as a backend for logging. All results are stored as before but additionally we report to the W&B server:

# Provide all configuration details as option
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
                what_to_track=['train_loss', 'test_loss'],
                use_wandb=True,
                wandb_config={
                  "key": "sadfasd",  # Only needed if not logged in
                  "entity": "roberttlange",  # Only needed if not logged in
                  "project": "some-project-name",
                  "group": "some-group-name"
                })

Citing the MLE-Infrastructure โœ๏ธ

If you use mle-logging in your research, please cite it as follows:

@software{mle_infrastructure2021github,
  author = {Robert Tjarko Lange},
  title = {{MLE-Infrastructure}: A Set of Lightweight Tools for Distributed Machine Learning Experimentation},
  url = {http://github.com/mle-infrastructure},
  year = {2021},
}

Development ๐Ÿ‘ท

You can run the test suite via python -m pytest -vv tests/. If you find a bug or are missing your favourite feature, feel free to create an issue and/or start contributing ๐Ÿค—.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mle_logging-0.0.5.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

mle_logging-0.0.5-py3-none-any.whl (34.9 kB view details)

Uploaded Python 3

File details

Details for the file mle_logging-0.0.5.tar.gz.

File metadata

  • Download URL: mle_logging-0.0.5.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for mle_logging-0.0.5.tar.gz
Algorithm Hash digest
SHA256 7636d6531bdad9da55ebcdaf40076011e66c7b76b6b2274cb729ccacb206db58
MD5 df8c53a60e3464eb649364b05589be33
BLAKE2b-256 fded7dab2c99e703f3756ed326f93faebca7fdd8585fe54fbafff65e9b4545ee

See more details on using hashes here.

File details

Details for the file mle_logging-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: mle_logging-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 34.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for mle_logging-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 27fa1f39ccf3bf9c733d8a05bd62bd7da5a515a56eb5aa4766fb31c9992c4f15
MD5 806f1e5f7f415efa5292021178da0cac
BLAKE2b-256 fd8cd2fa76d72216ae76d9908311ab19a21bb0d30008a26084489879291bdcf5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page