A lightweight module for research experiment reproducibility and analysis
Project description
Skeletor
Skeletor is a lightweight wrapper for research code. It is meant to enable fast, parallelizable prototyping without sacrificing reproducibility or ease of experiment analysis.
You can install it with: pip install skeletor-ml
Why use skeletor?
Tracking and analyzing experiment results is easy. Skeletor uses track, which provides a simple interface to log metrics throughout training and to view those metrics in a pandas DataFrame afterwards. It can log locally and to S3. Compared to other logging tools, track has minimal overhead and a very simple interface. No longer do you need to decorate every function or specify a convoluted experiment pipeline.
Orchestrating many experiments in parallel is simple and robust. Almost every experiment tracking framework implements its own scheduling and hyperparameter search algorithms. Luckily, I don't trust myself to do this correctly. Instead, skeletor uses ray, a high-performance distributed execution framework. In particular, it uses ray tune for scalable hyperparameter search.
Setup
Necessary packages are listed in setup.py
.
Just run pip install skeletor-ml
to get started.
Basic Usage
A basic example train.py
might look like:
import skeletor
from skeletor.models import build_model
from skeletor.datasets import build_dataset
from skeletor.optimizers import build_optimizer
import track
def add_args(parser):
parser.add_argument('--arch', default='resnet50')
parser.add_argument('--lr', default=0.1, type=float)
def train(epoch, trainloader, model, optimizer):
...
return avg_train_loss
def test(epoch, testloader, model):
...
return avg_test_loss
def experiment(args):
trainloader, testloader = build_dataset('cifar10')
model = build_model(args.arch, num_classes=10)
opt = build_optimizer('SGD', lr=args.lr)
for epoch in range(200):
track.debug("Starting epoch %d" % epoch)
train_loss = train(epoch, trainloader, model, opt)
test_loss = test(epoch, testloader, model)
track.metric(iteration=epoch,
train_loss=train_loss,
test_loss=test_loss)
skeletor.supply_args(add_args)
skeletor.execute(experiment)
You just have to supply (1) a function that adds your desired arguments to an ArgumentParser
object, and (2) a function that runs the experiment using the parsed arguments. You can then use track
to log statistics during training.
You can supply a third function to run analysis after training. skeletor.supply_postprocess(postprocess_fn)
takes in a user-defined function of the form postprocess_fn(proj)
. proj
is a track.Project
object.
Internally, the basic experiment flow is:
run add_args(parser) -> parse the args -> run experiment_fn(args) -> optionally run postprocess_fn(proj)
Launching experiments
To launch an experiment in train.py
, you just do python train.py <my args> <experimentname>
. The results will go in <logroot>/<experimentname>
. For example, you can do something like
CUDA_VISIBLE_DEVICES=0 python train.py --arch ResNet50 --lr .1 resnet_cifar
The same code can be used to launch several experiments in parallel. Suppose I have a config called config.yaml
that looks like:
arch: ResNet50
lr:
grid_search: [.001, .01, .1, 1.0]
I can test out all of these learning rates at the same time by running:
CUDA_VISIBLE_DEVICES=0,1 python train.py --config=config.yaml --self_host=2 resnet_cifar
Ray will handle scheduling the jobs across all available resources.
Logs (track
records) will be stored in <args.logroot>/<args.experimentname>
.
See the track
docs for how to access these records as DataFrames.
Examples
You can find an example of running a grid search for training a residual network on CIFAR-10 in PyTorch in examples/train.py
.
Getting experiment results
I added a utility in skeletor.proc
for converting all track
trial records for an experiment into a single Pandas DataFrame. It can also pickle it.
That means if I run an experiment like above called resnet_cifar
, I can access all of the results for all the trials as a single DataFrame by calling skeletor.proc.proj('resnet_cifar', './logs')
.
Registering custom models, dataloaders, and optimizers
Registering custom classes allows you to construct an instance of the specified class by calling build_model
, build_dataset
, or build_optimizer
with the class string name. This is useful for hyperparameter searching because you can search over these choices directly by class name.
I try to provide a simple interface for registering custom implementations with skeletor. For example, I can register a custom Model
class by calling skeletor.models.add_model(Model)
. This allows me to create models through skeletor.models.build_model('Model')
. You can also register entire modules full of definitions at once. There are analogous functions add_dataset, add_optimizer
for datasets and optimizers.
class MyNetwork(Module):
...
skeletor.models.add_model(MyNetwork)
arch_name = 'MyNetwork'
model = skeletor.models.build_model(arch_name)
Help me out / Things to Do
We have active issues! Feel free to suggest new improvements or add PRs to contribute.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file skeletor-ml-0.1.4.tar.gz
.
File metadata
- Download URL: skeletor-ml-0.1.4.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f16503afb7cb090c03b2f91b13e1fb7752944bb3a581b93abb0ce646d9acedf1 |
|
MD5 | ad0205c0393260acf7d2f08390926f0b |
|
BLAKE2b-256 | 09609d0d81637cee24d88bfc258b88a2fba44d36cdaa541e17090bce306fcd57 |
File details
Details for the file skeletor_ml-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: skeletor_ml-0.1.4-py3-none-any.whl
- Upload date:
- Size: 26.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c76a701e8ced98203a90c478c521ace572d92f225eb91bb943c629e3fc49bb03 |
|
MD5 | 1767d9275f234209d52d7b4cf0914cff |
|
BLAKE2b-256 | dce5b613ec02b0cce5254722f841cac4657fb59b8823abb2b3003c011c4c31bc |