Skip to main content

A lightweight module for research experiment reproducibility and analysis

Project description

Skeletor

Skeletor attempts to provide a lightweight wrapper for research code with two goals: (1) make it easy to track experiment results and data for later analysis and (2) orchestrate many experiments in parallel without worrying too much. The first goal is satisfied using track for logging experiment metrics. You can get the experiment results in a nice Pandas DataFrame with it, it logs in a nice format, and it can back up to S3. The second goal is satisfied using ray to parallelize multi-gpu grid searches over various experiment configurations. This is an improvement over some other setups because it allows us to use a proper distributed execution framework to handle trial scheduling.

99% of the work is being done by track and ray.

I added boilerplate model, architecture, and optimizer construction functions for some basic PyTorch setups. I will try to add more as time goes on, but I don't plan on adding TensorFlow things anytime soon.

Setup

Necessary packages are listed in setup.py. Just run pip install skeletor-ml to get started.

Basic Usage

All you really have to do is supply a supply_args(parser) function and an experiment_fn(parsed_args) function. The first one takes in an ArgumentParser object so you can supply your own arguments to the project. The second one will take in the parsed arguments and run your experiment.

You can use track to log statistics during training. A basic example train.py might look like:

import skeletor
from skeletor.models import build_model
from skeletor.optimizers import build_optimizer
import track

def add_args(parser):
    parser.add_argument('--arch', default='resnet50')
    parser.add_argument('--lr', default=0.1, type=float)

def train(epoch):
    ...
    return avg_train_loss

def test(epoch):
    ...
    return avg_test_loss

def experiment(args):
    model = build_model(args.arch, num_classes=10)
    opt = build_optimizer('SGD', lr=args.lr)
    for epoch in range(200):
        track.debug("Starting epoch %d" % epoch)
        train_loss = train(epoch)
        test_loss = test(epoch)
        track.metric(iteration=epoch,
                     train_loss=train_loss,
                     test_loss=test_loss)

skeletor.supply_args(add_args)
skeletor.execute(experiment)

To launch a single experiment, you can do something like

CUDA_VISIBLE_DEVICES=0 python train.py --arch resnet50 --lr .1 resnet_cifar

The same code can be used to launch several experiments in parallel. Suppose I have a config called config.yaml that looks like:

arch: resnet50
lr:
  grid_search: [.001, .01, .1, 1.0]

I can test out all of these learning rates at the same time by running:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py config.yaml --self_host=4 resnet_cifar

If I have more than 4 configurations, ray will handle job scheduling from the queue.

Logs (track records) will be stored in <args.logroot>/<args.experimentname>. See the track docs for how to access these records as DataFrames.

Examples

You can find an example of running a grid search for training a residual network on CIFAR-10 in PyTorch in examples/train.py.

Getting experiment results

I added a utility in skeletor.proc for converting all track trial records for an experiment into a single Pandas DataFrame. It can also pickle it.

That means if I run an experiment like above called resnet_cifar, I can access all of the results for all the trials as a single DataFrame by calling skeletor.proc.track.df('resnet_cifar').

Help me out

I tried to erase boilerplate by adding basic experiment utilities as well as various models and dataloaders. I haven't added much yet. Feel free to port over other architectures and datasets into the repo via PRs.

Things to do

Add capability to register custom models, dataset loaders, and optimizers with the build_model, build_dataset, and build_optimizer functions.

Sometimes track doesn't install correctly from the setup.py. If this happens, just run pip install --upgrade git+https://github.com/richardliaw/track.git@master#egg=track first. `

...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skeletor-ml-0.1.2.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

skeletor_ml-0.1.2-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file skeletor-ml-0.1.2.tar.gz.

File metadata

  • Download URL: skeletor-ml-0.1.2.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.6

File hashes

Hashes for skeletor-ml-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b06f6d68275f39d9d228e36bd9fc08ad7f13837cdbc982c4fd390cf44c92a697
MD5 a8ac946862c876dec01d7d551adfbda6
BLAKE2b-256 e789cee32cd4522c8d6fd11bd88f095e47a97e7ae81bdd8be442daa64f3d2730

See more details on using hashes here.

File details

Details for the file skeletor_ml-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: skeletor_ml-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.6

File hashes

Hashes for skeletor_ml-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2169058f6fc2647a23e3ffb28fbffc71614ff852619feadefb6daae65adbcb7d
MD5 a6aa41ddddb12d56c39ad2c2273d8b75
BLAKE2b-256 40c334441dbae0876d6bc54ce9ebd29fdbea0c32c8d7c6cbfa2c81734ec1e77c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page