Skip to main content

A lightweight python library for keeping track of numerical experiments

Project description

maggot is a very simple but useful library with primary goal to remove the need of custom experiment tracking approaches most people typically use. The focus is on reproducibility and removing boilerplate code.

Main issues maggot (at least partially) solves:

  • Removes the need for meditations on what is a proper name for the experiment. Say you are a machine learning researcher/engineer and you want to train a convolutional neural network with a particular set of parameters, say, 50 convolutional layers, dropout 0.5 and relu activations. You might want to create a separate directory for this experiment to store some checkpoints and summaries there. If you do not expect to have a lot of different models you can simply go off with something like "convnet50layers" or "convnet50relu". But if the number of experiments grows, you need a more reliable and automated solution. maggot offers such a solution - any experiment you run will have a name derived from the configuration parameters of your model. For the aforementioned model it would be "50-relu-0.5". You still can use a custom experiment name if you want to.
  • Assists reproducibility. Ever experienced a situation when results you got a month ago with an "old" model are no longer reproducible? Even if you are using git, you probably had used some command-line arguments that are now lost somewhere in the bash history... maggot stores all command line parameters, saves full stdout, and much more.
  • Restoring a model is now really painless! Since maggot saves all the parameters you used to run the experiment, all you need to restore a model is to provide a path to a saved experiment.

Let's consider a toy example and train an SVM on the Iris dataset.

First, import required packages and define command-line arguments:

import argparse
import os
import pickle

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score, StratifiedKFold
from maggot import Experiment

parser = argparse.ArgumentParser(
    formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument(
    "--C", type=float, default=1.0,
    help="Regularization parameter for SVM")
parser.add_argument(
    "--gamma", type=float, default=0.01,
    help="Kernel parameter for SVM")
parser.add_argument(
    "--cv", type=int, default=5,
    help="Number of folds for cross-validation")
parser.add_argument(
    "--cv_random_seed", type=int, default=42,
    help="Random seed for cross-validation iterator")

args = parser.parse_args()

Define a configuration object for the experiment:

svm_config = {
    "model": {
        "C": args.C,
        "gamma": args.gamma
    },
    "crossval": {
        "n_folds": args.cv,
        "_random_seed": args.cv_random_seed
    }
}

The random_seed parameter is not really important for analyzing and comparing different experiments, so we included an underscore before its name in config. This tells maggot to ignore it for experiment's identifier (short name).

Lets create an experiment object!

experiment = Experiment(config=svm_config)

From here you can reach the model identifier:

>>> experiment.config.identifier
5-1.0-0.01

Or the experiment directory:

>>> experiment.experiment_dir
experiments/5-1.0-0.01

Lets examine what this directory contains by now.

tree -a experiments/5-1.0-0.01/

experiments/5-1.0-0.01/
└── .maggot
    ├── command
    ├── config.json
    ├── environ
    ├── logs
    │   └── 2020-11-15-14-53-22-1605444802
    └── results.json

The command file contains the command we run from terminal, config.json stores the configuration, and logs directory will store any output you get during the run.

Lets train the model!

with experiment:

    config = experiment.config

    model = SVC(C=config.model.C, gamma=config.model.gamma)

    score = cross_val_score(
        model, X=iris.data, y=iris.target, scoring="accuracy",
        cv=StratifiedKFold(
            config.crossval.n_folds,
            shuffle=True,
            random_state=config.crossval._random_seed),
    ).mean()

Note that we can access parameters using dot notation rather than ["keyword"] notation, which looks much nicer.

We can print accuracy and this will be stored in a log file:

print("Accuracy is", round(score, 4))

Additionaly it's possible to register score as a result of this experiment:

experiment.register_result("accuracy", score)

This creates a results.json file in the .maggot directory with the following content:

{
    "accuracy": 0.9333333333333332
}

Later we can use such files from different experiments to be able to compare them.

Finally, lets save the model using pickle module.

with open(os.path.join(experiment.experiment_dir, "model.pkl"), "wb") as f:
    pickle.dump(model, f)

See how directory structure has changed:

tree -a experiments/5-1.0-0.01/

experiments/5-1.0-0.01/
├── .maggot
│   ├── command
│   ├── config.json
│   ├── environ
│   ├── logs
│   │   └── 2020-11-15-14-53-22-1605444802
│   └── results.json
└── model.pkl

If we want to restore the experiment we can easily do:

with Experiment(resume_from="experiments/5-1.0-0.01") as experiment:
    config = experiment.config    # the same config we created above
    ...

Configuration file and other stuff is loaded automatically.

We can easily run several experiments with different parameters:

python ../maggot/examples/iris_sklearn.py --C=10
python ../maggot/examples/iris_sklearn.py --C=10 --gamma=1
python ../maggot/examples/iris_sklearn.py --C=10 --gamma=0.1
python ../maggot/examples/iris_sklearn.py --C=0.001 --gamma=0.1
python ../maggot/examples/iris_sklearn.py --C=0.001 --gamma=10

And now let's compare them!

maggot summarize experiments --sort accuracy

Results for /home/dmytro/code/stuff/mag-tests/experiments:

              accuracy
5-10.0-0.1    0.986667
5-10.0-0.01   0.973333
5-10.0-1.0    0.953333
5-0.001-0.1   0.926667
5-0.001-10.0  0.813333

CLI

maggot has a minimalistic CLI interface for working with experiments and being able to inspect them, compare between them and so forth.

Currently, the following commands are supported:

  summarize     Summarize metrics from all experiments in a given directory.
  show-config	Show experiment config.
  show-command	Show command used to run an experiment.
  config-diff	Show diff between configs in two experiments.

Simple type maggot COMMAND in terminal to see help for a specific command.

Installation

To install, clone the repository and then use pip install . or simply run pip install git+https://github.com/ex4sperans/maggot.git to install directly from GitHub. The repository will be added to PyPI soon to simplify the installation.

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

maggot-0.2-py2.py3-none-any.whl (18.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file maggot-0.2-py2.py3-none-any.whl.

File metadata

  • Download URL: maggot-0.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.5

File hashes

Hashes for maggot-0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f405787ec20e20d9d6af1cc1dc2d4b4b4a97ef24b59774e8dbb377e1a201d54a
MD5 f6a2a659e7d62a5b557628403726fbff
BLAKE2b-256 c6fa98df9d0ba29d46f06e78f29114dbf8c7ba23eebcfaa3ecda752d427d8cd8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page