Skip to main content

A lightweight module for research experiment reproducibility and analysis

Project description

Skeletor Build Status

Skeletor is a lightweight wrapper for research code. It is meant to enable fast, parallelizable prototyping without sacrificing reproducibility or ease of experiment analysis.

You can install it with: pip install skeletor-ml

Why use skeletor?

Tracking and analyzing experiment results is easy. Skeletor uses track, which provides a simple interface to log metrics throughout training and to view those metrics in a pandas DataFrame afterwards. It can log locally and to S3. Compared to other logging tools, track has minimal overhead and a very simple interface. No longer do you need to decorate every function or specify a convoluted experiment pipeline.

Orchestrating many experiments in parallel is simple and robust. Almost every experiment tracking framework implements its own scheduling and hyperparameter search algorithms. Luckily, I don't trust myself to do this correctly. Instead, skeletor uses ray, a high-performance distributed execution framework. In particular, it uses ray tune for scalable hyperparameter search.

Setup

Necessary packages are listed in setup.py. Just run pip install skeletor-ml to get started.

Basic Usage

A basic example train.py might look like:

import skeletor
from skeletor.models import build_model
from skeletor.datasets import build_dataset
from skeletor.optimizers import build_optimizer
import track

def add_args(parser):
    parser.add_argument('--arch', default='resnet50')
    parser.add_argument('--lr', default=0.1, type=float)

def train(epoch, trainloader, model, optimizer):
    ...
    return avg_train_loss

def test(epoch, testloader, model):
    ...
    return avg_test_loss

def experiment(args):
    trainloader, testloader = build_dataset('cifar10')
    model = build_model(args.arch, num_classes=10)
    opt = build_optimizer('SGD', lr=args.lr)
    for epoch in range(200):
        track.debug("Starting epoch %d" % epoch)
        train_loss = train(epoch, trainloader, model, opt)
        test_loss = test(epoch, testloader, model)
        track.metric(iteration=epoch,
                     train_loss=train_loss,
                     test_loss=test_loss)

skeletor.supply_args(add_args)
skeletor.execute(experiment)

You just have to supply (1) a function that adds your desired arguments to an ArgumentParser object, and (2) a function that runs the experiment using the parsed arguments. You can then use track to log statistics during training.

You can supply a third function to run analysis after training. skeletor.supply_postprocess(postprocess_fn) takes in a user-defined function of the form postprocess_fn(proj). proj is a track.Project object.

Internally, the basic experiment flow is:

run add_args(parser) -> parse the args -> run experiment_fn(args) -> optionally run postprocess_fn(proj)

Launching experiments

To launch an experiment in train.py, you just do python train.py <my args> <experimentname>. The results will go in <logroot>/<experimentname>. For example, you can do something like

CUDA_VISIBLE_DEVICES=0 python train.py --arch ResNet50 --lr .1 resnet_cifar

The same code can be used to launch several experiments in parallel. Suppose I have a config called config.yaml that looks like:

arch: ResNet50
lr:
  grid_search: [.001, .01, .1, 1.0]

I can test out all of these learning rates at the same time by running:

CUDA_VISIBLE_DEVICES=0,1 python train.py --config=config.yaml --self_host=2 resnet_cifar

Ray will handle scheduling the jobs across all available resources.

Logs (track records) will be stored in <args.logroot>/<args.experimentname>. See the track docs for how to access these records as DataFrames.

Examples

You can find an example of running a grid search for training a residual network on CIFAR-10 in PyTorch in examples/train.py.

Getting experiment results

I added a utility in skeletor.proc for converting all track trial records for an experiment into a single Pandas DataFrame. It can also pickle it.

That means if I run an experiment like above called resnet_cifar, I can access all of the results for all the trials as a single DataFrame by calling skeletor.proc.proj('resnet_cifar', './logs').

Registering custom models, dataloaders, and optimizers

Registering custom classes allows you to construct an instance of the specified class by calling build_model, build_dataset, or build_optimizer with the class string name. This is useful for hyperparameter searching because you can search over these choices directly by class name.

I try to provide a simple interface for registering custom implementations with skeletor. For example, I can register a custom Model class by calling skeletor.models.add_model(Model). This allows me to create models through skeletor.models.build_model('Model'). You can also register entire modules full of definitions at once. There are analogous functions add_dataset, add_optimizer for datasets and optimizers.

class MyNetwork(Module):
    ...

skeletor.models.add_model(MyNetwork)

arch_name = 'MyNetwork'
model = skeletor.models.build_model(arch_name)

Help me out / Things to Do

We have active issues! Feel free to suggest new improvements or add PRs to contribute.

...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skeletor-ml-0.1.4.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

skeletor_ml-0.1.4-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file skeletor-ml-0.1.4.tar.gz.

File metadata

  • Download URL: skeletor-ml-0.1.4.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.5

File hashes

Hashes for skeletor-ml-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f16503afb7cb090c03b2f91b13e1fb7752944bb3a581b93abb0ce646d9acedf1
MD5 ad0205c0393260acf7d2f08390926f0b
BLAKE2b-256 09609d0d81637cee24d88bfc258b88a2fba44d36cdaa541e17090bce306fcd57

See more details on using hashes here.

File details

Details for the file skeletor_ml-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: skeletor_ml-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.5

File hashes

Hashes for skeletor_ml-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c76a701e8ced98203a90c478c521ace572d92f225eb91bb943c629e3fc49bb03
MD5 1767d9275f234209d52d7b4cf0914cff
BLAKE2b-256 dce5b613ec02b0cce5254722f841cac4657fb59b8823abb2b3003c011c4c31bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page