Machine Learning Experiment Logging
Project description
A Lightweight Logger for ML Experiments ๐
Simple logging of statistics, model checkpoints, plots and other objects for your Machine Learning Experiments (MLE). Furthermore, the MLELogger
comes with smooth multi-seed result aggregation and combination of multi-configuration runs. For a quickstart check out the notebook blog ๐
The API ๐ฎ
from mle_logging import MLELogger
# Instantiate logging to experiment_dir
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
what_to_track=['train_loss', 'test_loss'],
experiment_dir="experiment_dir/",
model_type='torch')
time_tic = {'num_updates': 10, 'num_epochs': 1}
stats_tic = {'train_loss': 0.1234, 'test_loss': 0.1235}
# Update the log with collected data & save it to .hdf5
log.update(time_tic, stats_tic)
log.save()
You can also log model checkpoints, matplotlib figures and other .pkl
compatible objects.
# Save a model (torch, tensorflow, sklearn, jax, numpy)
import torchvision.models as models
model = models.resnet18()
log.save_model(model)
# Save a matplotlib figure as .png
fig, ax = plt.subplots()
log.save_plot(fig)
# You can also save (somewhat) arbitrary objects .pkl
some_dict = {"hi" : "there"}
log.save_extra(some_dict)
Or do everything in a single line...
log.update(time_tic, stats_tic, model, fig, extra, save=True)
File Structure & Re-Loading ๐
The MLELogger
will create a nested directory, which looks as follows:
experiment_dir
โโโ extra: Stores saved .pkl object files
โโโ figures: Stores saved .png figures
โโโ logs: Stores .hdf5 log files (meta, stats, time)
โโโ models: Stores different model checkpoints
โโโ init: Stores initial checkpoint
โโโ final: Stores most recent checkpoint
โโโ every_k: Stores every k-th checkpoint provided in update
โโโ top_k: Stores portfolio of top-k checkpoints based on performance
โโโ tboards: Stores tensorboards for model checkpointing
โโโ <config_name>.json: Copy of configuration file (if provided)
For visualization and post-processing load the results via
from mle_logging import load_log
log_out = load_log("experiment_dir/")
# The results can be accessed via meta, stats and time keys
# >>> log_out.meta.keys()
# odict_keys(['experiment_dir', 'extra_storage_paths', 'fig_storage_paths', 'log_paths', 'model_ckpt', 'model_type'])
# >>> log_out.stats.keys()
# odict_keys(['test_loss', 'train_loss'])
# >>> log_out.time.keys()
# odict_keys(['time', 'num_epochs', 'num_updates', 'time_elapsed'])
If an experiment was aborted, you can reload and continue the previous run via the reload=True
option:
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
what_to_track=['train_loss', 'test_loss'],
experiment_dir="experiment_dir/",
model_type='torch',
reload=True)
Installation โณ
A PyPI installation is available via:
pip install mle-logging
If you want to get the most recent commit, please install directly from the repository:
pip install git+https://github.com/mle-infrastructure/mle-logging.git@main
Advanced Options ๐ด
Merging Multiple Logs ๐ซ
Merging Multiple Random Seeds ๐ฑ + ๐ฑ
from mle_logging import merge_seed_logs
merge_seed_logs("multi_seed.hdf", "experiment_dir/")
log_out = load_log("experiment_dir/")
# >>> log.eval_ids
# ['seed_1', 'seed_2']
Merging Multiple Configurations ๐ + ๐
from mle_logging import merge_config_logs, load_meta_log
merge_config_logs(experiment_dir="experiment_dir/",
all_run_ids=["config_1", "config_2"])
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
# >>> log.eval_ids
# ['config_2', 'config_1']
# >>> meta_log.config_1.stats.test_loss.keys()
# odict_keys(['mean', 'std', 'p50', 'p10', 'p25', 'p75', 'p90']))
Plotting of Logs ๐งโ๐จ
meta_log = load_meta_log("multi_config_dir/meta_log.hdf5")
meta_log.plot("train_loss", "num_updates")
Storing Checkpoint Portfolios ๐
Logging every k-th checkpoint update โ โฉ ... โฉ โ
# Save every second checkpoint provided in log.update (stored in models/every_k)
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
what_to_track=['train_loss', 'test_loss'],
experiment_dir='every_k_dir/',
model_type='torch',
ckpt_time_to_track='num_updates',
save_every_k_ckpt=2)
Logging top-k checkpoints based on metric ๐ฑ
# Save top-3 checkpoints provided in log.update (stored in models/top_k)
# Based on minimizing the test_loss metric
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
what_to_track=['train_loss', 'test_loss'],
experiment_dir="top_k_dir/",
model_type='torch',
ckpt_time_to_track='num_updates',
save_top_k_ckpt=3,
top_k_metric_name="test_loss",
top_k_minimize_metric=True)
Weights&Biases Backend Integration ๐งโ๐จ
You can also use W&B as a backend for logging. All results are stored as before but additionally we report to the W&B server:
# Provide all configuration details as option
log = MLELogger(time_to_track=['num_updates', 'num_epochs'],
what_to_track=['train_loss', 'test_loss'],
use_wandb=True,
wandb_config={
"key": "sadfasd", # Only needed if not logged in
"entity": "roberttlange", # Only needed if not logged in
"project": "some-project-name",
"group": "some-group-name"
})
Citing the MLE-Infrastructure โ๏ธ
If you use mle-logging
in your research, please cite it as follows:
@software{mle_infrastructure2021github,
author = {Robert Tjarko Lange},
title = {{MLE-Infrastructure}: A Set of Lightweight Tools for Distributed Machine Learning Experimentation},
url = {http://github.com/mle-infrastructure},
year = {2021},
}
Development ๐ท
You can run the test suite via python -m pytest -vv tests/
. If you find a bug or are missing your favourite feature, feel free to create an issue and/or start contributing ๐ค.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mle_logging-0.0.5.tar.gz
.
File metadata
- Download URL: mle_logging-0.0.5.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7636d6531bdad9da55ebcdaf40076011e66c7b76b6b2274cb729ccacb206db58 |
|
MD5 | df8c53a60e3464eb649364b05589be33 |
|
BLAKE2b-256 | fded7dab2c99e703f3756ed326f93faebca7fdd8585fe54fbafff65e9b4545ee |
File details
Details for the file mle_logging-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: mle_logging-0.0.5-py3-none-any.whl
- Upload date:
- Size: 34.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27fa1f39ccf3bf9c733d8a05bd62bd7da5a515a56eb5aa4766fb31c9992c4f15 |
|
MD5 | 806f1e5f7f415efa5292021178da0cac |
|
BLAKE2b-256 | fd8cd2fa76d72216ae76d9908311ab19a21bb0d30008a26084489879291bdcf5 |