Python tools for training Neural Networks with KubeML
Project description
KubeML
KubeML provides wrappers and tools that allow the interaction with user code written in PyTorch with the distributed training and serving functionality offered by KubeML
Installing
Install and update using pip
pip install kubeml
Usage
The main functionality offered is in the shape of Models and Datasets. A KubeDataset is a convenience wrapper over a torch dataset which, like when using torch, users extend with their own functionality to adapt to their data. A simple example of how to create a dataset to train with KubeML is seen below.
The Dataset class
from kubeml import KubeDataset
from torchvision import transforms
class MnistDataset(KubeDataset):
def __init__(self):
super().__init__("mnist")
self.transf = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
def __getitem__(self, index):
x = self.data[index]
y = self.labels[index]
return self.transf(x), y.astype('int64')
def __len__(self):
return len(self.data)
The user only needs to provide to the constructor the name of the dataset as was uploaded to the KubeML storage, the dataset will take care of fetching only the corresponding minibatches of data so that the network can be trained with a model parallel approach.
As with a normal torch dataset, the user must implement the __getitem__ and __len__ methods to iterate over the dataset.
The dataset exposes two member variables:
dataHolds the features used as input to the networklabelsHolds the output labels
Both are saved as numpy arrays.
The Model class
The other main component is the model class. This abstract class abstracts the complexity of distributing the training
among multiple workers, nodes and GPUs. The constructor only takes a torch model and the dataset as a parameter. The user only needs
to implement the abstract methods of the class, train, infer, validate init and configure_optimizers with the behavior they
want from the network.
The Kubenet exposes the batch_size and lr arguments which the user can change when starting the train job
from kubeml import KubeModel
import torch
import torch.nn as nn
import numpy as np
class KubeLeNet(KubeModel):
def __init__(self, network, dataset):
super().__init__(network, dataset, gpu=True)
@abstractmethod
def configure_optimizers(self) -> torch.optim.Optimizer:
pass
# Train trains the model for an epoch and returns the loss
@abstractmethod
def train(self, x, y, batch_index) -> float:
pass
# Validate validates the model on the test data and returns a tuple
# of (accuracy, loss)
@abstractmethod
def validate(self, x, y, batch_index) -> Tuple[float, float]:
pass
# Infer receives the data points or images as a list and returns
# the predictions of the network
@abstractmethod
def infer(self, data: List[Any]) -> Union[torch.Tensor, np.ndarray, List[float]]:
pass
# Init initializes the model in a particular way
@abstractmethod
def init(self, model: nn.Module):
pass
An example implementation of the init and train functions can be done as follows
# Train trains the model for an epoch and returns the loss
def train(self, x, y, batch_index) -> float:
# define the device for training and load the data
loss_fn = nn.CrossEntropyLoss()
total_loss = 0
self.optimizer.zero_grad()
output = self(x)
# compute loss and backprop
# logging.debug(f'Shape of the output is {output.shape}, y is {y.shape}')
loss = loss_fn(output, y)
loss.backward()
# step with the optimizer
self.optimizer.step()
total_loss += loss.item()
if batch_index % 10 == 0:
logging.info(f"Index {batch_index}, error: {loss.item()}")
return total_loss
# Intialize the network as a pytorch model
def init(self, model):
def init_weights(m: nn.Module):
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight)
nn.init.constant_(m.bias, 0.01)
if isinstance(m, nn.Linear):
nn.init.xavier_uniform_(m.weight)
nn.init.constant_(m.bias, 0.01)
model.apply(init_weights)
Writing the training function
At the moment of creating a serverless function which will serve as a worker for the model training process, the
steps are simple, simply write the code initializing the network in the main method of the function, and call
start on the KubeML model.
def main():
# Create the PyTorch Model
lenet = LeNet()
dataset = MnistDataset()
kubenet = KubeLeNet(lenet, dataset)
return kubenet.start()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kubeml-0.1.7.tar.gz.
File metadata
- Download URL: kubeml-0.1.7.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b54946309047495870ffc9fe67e85e3004ec9da9190394e10e99a846b9ba8aa
|
|
| MD5 |
b399de5d18d50a162987a1a5cdee18c8
|
|
| BLAKE2b-256 |
ec659880a9964149b36a2833a4c1b5674a88dccfb1f3e1ddcf6b677c7a694377
|
File details
Details for the file kubeml-0.1.7-py3-none-any.whl.
File metadata
- Download URL: kubeml-0.1.7-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04fbf443ec013a0ee359a56cdaca7673057e6b7682c3556f7cfc9bdf97cc071d
|
|
| MD5 |
1a23d0ccdf6719a0c82aaf6f700fb573
|
|
| BLAKE2b-256 |
cbaa52d0f99fc12ff3811fd385f4ea4d179757b4f4421ac213ddadc650bdb2b8
|