A collection of core machine learning tools
Project description
mlfab
What is this?
This is a framework for trying out machine learning ideas.
Getting Started
Install the package using:
pip install mlfab
Or, to install the latest branch:
pip install 'mlfab @ git+https://github.com/kscalelabs/mlfab.git@master'
Simple Example
This framework provides an abstraction for quickly implementing and training PyTorch models. The workhorse for doing this is mlfab.Task
, which wraps all of the training logic into a single cohesive unit. We can override functions on that method to get special functionality, but the default functionality is often good enough. Here's an example for training an MNIST model:
from dataclasses import dataclass
import torch.nn.functional as F
from dpshdl.dataset import Dataset
from dpshdl.impl.mnist import MNIST
from torch import Tensor, nn
from torch.optim.optimizer import Optimizer
import mlfab
@dataclass(kw_only=True)
class Config(mlfab.Config):
in_dim: int = mlfab.field(1, help="Number of input dimensions")
learning_rate: float = mlfab.field(1e-3, help="Learning rate to use for optimizer")
betas: tuple[float, float] = mlfab.field((0.9, 0.999), help="Beta values for Adam optimizer")
weight_decay: float = mlfab.field(1e-4, help="Weight decay to use for the optimizer")
warmup_steps: int = mlfab.field(100, help="Number of warmup steps to use for the optimizer")
class MnistClassification(mlfab.Task[Config]):
def __init__(self, config: Config) -> None:
super().__init__(config)
self.model = nn.Sequential(
nn.Conv2d(config.in_dim, 32, 3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(64 * 7 * 7, 128),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.Linear(128, 10),
)
def set_loggers(self) -> None:
self.add_logger(
mlfab.StdoutLogger(),
mlfab.TensorboardLogger(self.exp_dir),
)
def get_dataset(self, phase: mlfab.Phase) -> MNIST:
root_dir = mlfab.get_data_dir() / "mnist"
return MNIST(root_dir=root_dir, train=phase == "train")
def forward(self, x: Tensor) -> Tensor:
return self.model(x)
def get_loss(self, batch: tuple[Tensor, Tensor], state: mlfab.State) -> Tensor:
x_bhw, y_b = batch
x_bchw = (x_bhw.float() / 255.0).unsqueeze(1)
yhat_bc = self(x_bchw)
self.log_step(batch, yhat_bc, state)
return F.cross_entropy(yhat_bc, y_b)
def log_valid_step(self, batch: tuple[Tensor, Tensor], output: Tensor, state: mlfab.State) -> None:
(x_bhw, y_b), yhat_bc = batch, output
def get_label_strings() -> list[str]:
ypred_b = yhat_bc.argmax(-1)
return [f"ytrue={y_b[i]}, ypred={ypred_b[i]}" for i in range(len(y_b))]
self.log_labeled_images("images", lambda: (x_bhw, get_label_strings()))
if __name__ == "__main__":
# python -m examples.mnist
MnistClassification.launch(Config(batch_size=16))
Let's break down each part individually.
Config
Tasks are parametrized using a config dataclass. The ml.field
function is a lightweight wrapper around dataclasses.field
which is a bit more ergonomic, and ml.Config
is a bigger dataclass which contains a bunch of other options for configuring training.
@dataclass(kw_only=True)
class Config(mlfab.Config):
in_dim: int = mlfab.field(1, help="Number of input dimensions")
Model
All tasks should subclass ml.Task
and override the generic Config
with the task-specific config. This is very important, not just because it makes your life easier by working nicely with your typechecker, but because the framework looks at the generic type when resolving the config for the given task.
class MnistClassification(mlfab.Task[Config]):
def __init__(self, config: Config) -> None:
super().__init__(config)
self.model = nn.Sequential(
nn.Conv2d(config.in_dim, 32, 3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.Conv2d(32, 32, 3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(64 * 7 * 7, 128),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.Linear(128, 10),
)
Loggers
mlfab
supports logging to multiple downstream loggers, and provides a bunch of helper functions for doing common logging operations, like rate limiting, converting image resolution to normal sizes, overlaying captions on images, and more.
If this function is not overridden, the task will just log to stdout
.
def set_loggers(self) -> None:
self.add_logger(
mlfab.StdoutLogger(),
mlfab.TensorboardLogger(self.exp_dir),
)
Datasets
The task should return the dataset used for training, based on the phase. ml.Phase
is a string literal with values in ["train", "valid", "test"]
. mlfab.get_data_dir()
returns the data directory, which can be set in a configuration file which lives in ~/.mlfab.yml
. The default configuration file will be written on first run if it doesn't exist yet.
def get_dataset(self, phase: mlfab.Phase) -> Dataset[tuple[Tensor, Tensor]]:
root_dir = mlfab.get_data_dir() / "mnist"
return MNIST(root_dir=root_dir, train=phase == "train")
Compute Loss
Each mlfab
model should either implement the forward
function, which should take a batch from the dataset and return the loss, or, if more control is desired, the get_loss
function can be overridden.
def forward(self, x: Tensor) -> Tensor:
return self.model(x)
def get_loss(self, batch: tuple[Tensor, Tensor], state: mlfab.State) -> Tensor:
x_bhw, y_b = batch
x_bchw = (x_bhw.float() / 255.0).unsqueeze(1)
yhat_bc = self(x_bchw)
self.log_step(batch, yhat_bc, state)
return F.cross_entropy(yhat_bc, y_b)
Logging
When we call log_step
in the get_loss
function, it delegates to either log_train_step
, log_valid_step
or log_test_step
, depending on what state.phase
is. In this case, on each validation step we log images of the MNIST digits with the labels that our model predicts.
def log_valid_step(self, batch: tuple[Tensor, Tensor], output: Tensor, state: mlfab.State) -> None:
(x_bhw, y_b), yhat_bc = batch, output
def get_label_strings() -> list[str]:
ypred_b = yhat_bc.argmax(-1)
return [f"ytrue={y_b[i]}, ypred={ypred_b[i]}" for i in range(len(y_b))]
self.log_labeled_images("images", lambda: (x_bhw, get_label_strings()))
Running
We can launch a training job using the launch
class method. The config can be a Config
object, or it can be the path to a config.yaml
file located in the same directory as the task file. You can additionally provide the launcher
argument, which supports training the model across multiple GPUs or nodes.
if __name__ == "__main__":
MnistClassification.launch(Config(batch_size=16))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mlfab-0.2.5.tar.gz
.
File metadata
- Download URL: mlfab-0.2.5.tar.gz
- Upload date:
- Size: 170.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5c4813af978f388e4dc1a7471fe4828ad848d92cc60f300ebeceabc6806021e |
|
MD5 | a8095463b0d6b6ada375ed8aa1a1ff96 |
|
BLAKE2b-256 | 43024157e32942951b097367668e0d99a5ef1072450ebd18cd6f8a762946e83b |
File details
Details for the file mlfab-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: mlfab-0.2.5-py3-none-any.whl
- Upload date:
- Size: 202.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e11068b291afb8d92141595ed4bc2ec4b248fea003be1cb595a0d6393d7b23d2 |
|
MD5 | 07cb45d7902c9d5c42ecacda98d2a822 |
|
BLAKE2b-256 | 8f35b6b5b253073f4ce7fc483c7e9c1180fd53743fc4ab2345b26a36c241bb72 |