Python tools for training Neural Networks with KubeML

Project description

KubeML

KubeML provides wrappers and tools that allow the interaction with user code written in PyTorch with the distributed training and serving functionality offered by KubeML

Installing

Install and update using pip

pip install kubeml

Usage

The main functionality offered is in the shape of Models and Datasets. A KubeDataset is a convenience wrapper over a torch dataset which, like when using torch, users extend with their own functionality to adapt to their data. A simple example of how to create a dataset to train with KubeML is seen below.

The Dataset class

from kubeml import KubeDataset
from torchvision import transforms

class MnistDataset(KubeDataset):

    def __init__(self):
        super().__init__("mnist")
        self.transf = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])

    def __getitem__(self, index):
        x = self.data[index]
        y = self.labels[index]

        return self.transf(x), y.astype('int64')

    def __len__(self):
        return len(self.data)

The user only needs to provide to the constructor the name of the dataset as was uploaded to the KubeML storage, the dataset will take care of fetching only the corresponding minibatches of data so that the network can be trained with a model parallel approach.

As with a normal torch dataset, the user must implement the __getitem__ and __len__ methods to iterate over the dataset. The dataset exposes two member variables:

data Holds the features used as input to the network
labels Holds the output labels

Both are saved as numpy arrays.

The Model class

The other main component is the model class. This abstract class abstracts the complexity of distributing the training among multiple workers, nodes and GPUs. The constructor only takes a torch model and the dataset as a parameter. The user only needs to implement the abstract methods of the class, train, infer, validate init and configure_optimizers with the behavior they want from the network.

The Kubenet exposes the batch_size and lr arguments which the user can change when starting the train job

from kubeml import KubeModel
import torch
import torch.nn as nn
import numpy as np

class KubeLeNet(KubeModel):

    def __init__(self, network, dataset):
        super().__init__(network, dataset, gpu=True)

    @abstractmethod
    def configure_optimizers(self) -> torch.optim.Optimizer:
        pass

    # Train trains the model for an epoch and returns the loss
    @abstractmethod
    def train(self, x, y, batch_index) -> float:
        pass

    # Validate validates the model on the test data and returns a tuple
    # of (accuracy, loss)
    @abstractmethod
    def validate(self, x, y, batch_index) -> Tuple[float, float]:
        pass

    # Infer receives the data points or images as a list and returns 
    # the predictions of the network
    @abstractmethod
    def infer(self, data: List[Any]) -> Union[torch.Tensor, np.ndarray, List[float]]:
        pass

    # Init initializes the model in a particular way
    @abstractmethod
    def init(self, model: nn.Module):
       pass

An example implementation of the init and train functions can be done as follows

    # Train trains the model for an epoch and returns the loss
     def train(self, x, y, batch_index) -> float:
        # define the device for training and load the data
        loss_fn = nn.CrossEntropyLoss()
        total_loss = 0

        self.optimizer.zero_grad()
        output = self(x)

        # compute loss and backprop
        # logging.debug(f'Shape of the output is {output.shape}, y is {y.shape}')
        loss = loss_fn(output, y)
        loss.backward()

        # step with the optimizer
        self.optimizer.step()
        total_loss += loss.item()

        if batch_index % 10 == 0:
            logging.info(f"Index {batch_index}, error: {loss.item()}")

        return total_loss

    # Intialize the network as a pytorch model
    def init(self, model):
        def init_weights(m: nn.Module):
            if isinstance(m, nn.Conv2d):
                nn.init.xavier_uniform_(m.weight)
                nn.init.constant_(m.bias, 0.01)
            if isinstance(m, nn.Linear):
                nn.init.xavier_uniform_(m.weight)
                nn.init.constant_(m.bias, 0.01)

        model.apply(init_weights)

Writing the training function

At the moment of creating a serverless function which will serve as a worker for the model training process, the steps are simple, simply write the code initializing the network in the main method of the function, and call start on the KubeML model.

def main():
    # Create the PyTorch Model
    lenet = LeNet()
    dataset = MnistDataset()
    kubenet = KubeLeNet(lenet, dataset)
    return kubenet.start()

Project details

Release history Release notifications | RSS feed

This version

0.1.7

Jun 12, 2021

0.1.6

Apr 30, 2021

0.1.6rc4 pre-release

Apr 26, 2021

0.1.6rc3 pre-release

Apr 22, 2021

0.1.6rc2 pre-release

Apr 22, 2021

0.1.6rc1 pre-release

Apr 22, 2021

0.1.5

Apr 19, 2021

0.1.4

Apr 17, 2021

0.1.4rc1 pre-release

Apr 12, 2021

0.1.3

Mar 24, 2021

0.1.3rc1 pre-release

Mar 16, 2021

0.1.3rc0 pre-release

Mar 15, 2021

0.1.2

Mar 5, 2021

0.1.2rc4 pre-release

Mar 4, 2021

0.1.2rc3 pre-release

Mar 3, 2021

0.1.2rc2 pre-release

Mar 3, 2021

0.1.2rc1 pre-release

Mar 3, 2021

0.1.2rc0 pre-release

Mar 3, 2021

0.1.1

Mar 1, 2021

0.1

Jan 26, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kubeml-0.1.7.tar.gz (11.2 kB view details)

Uploaded Jun 12, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kubeml-0.1.7-py3-none-any.whl (13.5 kB view details)

Uploaded Jun 12, 2021 Python 3

File details

Details for the file kubeml-0.1.7.tar.gz.

File metadata

Download URL: kubeml-0.1.7.tar.gz
Upload date: Jun 12, 2021
Size: 11.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.8

File hashes

Hashes for kubeml-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`9b54946309047495870ffc9fe67e85e3004ec9da9190394e10e99a846b9ba8aa`
MD5	`b399de5d18d50a162987a1a5cdee18c8`
BLAKE2b-256	`ec659880a9964149b36a2833a4c1b5674a88dccfb1f3e1ddcf6b677c7a694377`

See more details on using hashes here.

File details

Details for the file kubeml-0.1.7-py3-none-any.whl.

File metadata

Download URL: kubeml-0.1.7-py3-none-any.whl
Upload date: Jun 12, 2021
Size: 13.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.8

File hashes

Hashes for kubeml-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`04fbf443ec013a0ee359a56cdaca7673057e6b7682c3556f7cfc9bdf97cc071d`
MD5	`1a23d0ccdf6719a0c82aaf6f700fb573`
BLAKE2b-256	`cbaa52d0f99fc12ff3811fd385f4ea4d179757b4f4421ac213ddadc650bdb2b8`

See more details on using hashes here.

kubeml 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

KubeML

Installing

Usage

The Dataset class

The Model class

Writing the training function

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes