A machine learning library agnostic framework for model training

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.7

Project description

If you are eager to dive in to training scripts that use MLpug, checkout the examples directory!

MLpug

MLpug is a machine learning library agnostic framework for model training.

A lot of the functionality you need to train your machine learning model is independent of the machine learning library you're using, e.g. PyTorch and Tensorflow. For instance,

checkpoint management,
evaluation of validation set loss and other custom metrics,
progress logging,
progress visualization using Tensorboard,
the use of gradient accumulation to train with large batch sizes using limited GPU memory, etc..

You need such functionality no matter what machine learning framework you are using.

MLpug provides a single framework with a unified API for all such training functionality, independent of the machine learning library you are using. This also implies that when you switch library you can reuse your training code with no, or minimal, changes.

Supported backends

Currently, MLpug supports the following deep learning/machine learning library 'backends':

PyTorch
PyTorch/XLA (Training with Pytorch on TPUs)
Tensorflow (in development, some features not available yet)

MLpug focus

Although MLpug should be able to deal with any training job, its functionality is mostly focussed on dealing with
training large models on large datasets, using limited hardware (GPU or TPU) resources and memory.

Almost at version 0.1!

MLpug is still in development. If you are having trouble using MLpug for your use case, or when you have found a bug, please file an issue.

Installing MLpug

Hello World (PT | XLA | TF)

The following sections are documentation ToDo's, but provide insight in to MLpug's features:
The logs object

Callbacks and the training life cycle

Progress Logging

Model components vs Training model

Distributed training

Checkpoint management
      Using the CheckpointManager
      Using training checkpoints
      Using model checkpoints
      Checkpointing on error or interrupt

MLpug metric evaluators
      Auxiliary batch training results
      Calculating custom metrics
      Conditional computation of metrics

Batch chunking, dealing with GPU memory limits
      Gradient Accumulation
      Chunked Metric Computation

Using Tensorboard
      Tensorboard made easy with AutoTensorboard
      More fine grained control

Learning Rate Scheduling

Multi GPU training

Mixed Precision Training

CUDA Memory tools

Using multiple optimizers

Installing MLpug

Please ensure that you are using Python3.7+.

Install as follows:

pip install mlpug

Usage with PyTorch

When you want to use MLpug with PyTorch, you will need to install it:

pip install torch torchvision

Usage with Tensorflow

When you want to use MLpug with Tensorflow, you will need to install it:

pip install tensorflow

Hello World!

This is the Hello World of training with MLpug. You will see that the usage of MLpug with Pytorch, Pytorch/XLA and Tensorflow is very similar.

For details please see :

You can download and run these examples (for XLA you need to use a TPU on Google Cloud, or use Google Colab).

When reading through the explanation below it might be that you still have a lot of questions about the why and how of training with MLpug, however I will expand the MLpug documentation soon, so you will get better insight.

'Hello World' with PyTorch

To use MLpug with Pytorch

import mlpug.pytorch as mlp

Before we can start training we need an iterable dataset that can provide our training batches.

training_dataset = torch.utils.data.DataLoader(training_data,
                                               batch_size=batch_size,
                                               shuffle=False,
                                               num_workers=3)

... and a model we want to train

classifier = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(784, 128),
    torch.nn.ReLU(),
    torch.nn.Linear(128, 10))

MLpug needs a way to evaluate the loss of the model. One way to do that is to define a TrainModel that outputs the loss

class TrainModel(torch.nn.Module):
    def __init__(self, classifier):
        super(TrainModel, self).__init__()

        self.classifier = classifier
        self.loss_func = torch.nn.CrossEntropyLoss()

    def forward(self, batch_data, evaluate_settings, inference_mode=None):
        images, true_labels = batch_data

        logits = self.classifier(images)
        return self.loss_func(logits, true_labels)

train_model = TrainModel(classifier)

To train the model we will also need an optimizer

optimizer = torch.optim.Adam(classifier.parameters(), eps=1e-7)

To now use MLpug to start training, we need to create a Trainer which will be used by a TrainingManager.

trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer, model_components=classifier)

MLpug uses a callback system allowing you to customize and extend the training functionality. The list of callback instances you provide the TrainingManager will be called using hooks at different stages of the training process.

# At minimum you want to log the loss in the training progress
# By default the batch loss and the moving average of the loss are calculated and logged
loss_evaluator = mlp.evaluation.MetricEvaluator(trainer=trainer)
callbacks = [
    mlp.callbacks.TrainingMetricsLogger(metric_evaluator=loss_evaluator),
    # Calculate validation loss only once per epoch over the whole dataset
    mlp.callbacks.TestMetricsLogger(validation_dataset,
                                    'validation',
                                    metric_evaluator=loss_evaluator,
                                    batch_level=False),
    mlp.callbacks.LogProgress(log_period=progress_log_period, set_names=['training', 'validation']),
]

The TrainingMetricsLogger and the TestMetricsLogger callback instances log training and validation set loss values in a logs object that is passed through all callbacks during training. The LogProgress callback instance logs the metric values stored in the received logs object.

We can now instantiate the TrainingManager and pass it the trainer.

manager = mlp.trainers.TrainingManager(trainer,
                                       training_dataset,
                                       num_epochs=num_epochs,
                                       callbacks=callbacks)

Before we can start training we still have to provide the train_model to the trainer.

trainer.set_training_model(train_model)

The final step is to actually start training:

manager.start_training()

Running pytorch/hello_world.py finishes like this:

###############################################################################
Epoch 9/9	READY - Duration 0:00:08
Moving average:
training       : loss          0.238.

Computed over dataset:
validation     : loss          0.346.



INFO    : TrainingManager::_train : Training completed. All good! ❤️

Using the classifier ...
real label = 9, predicted label = 9

'Hello World' with PyTorch/XLA

The Hello World example with PyTorch/XLA, is largely the same as with PyTorch. There are only two small differences.

To use MLpug with Pytorch/XLA, load the correct backend

import mlpug.pytorch.xla as mlp

Load your model on a TPU core:

import torch_xla.core.xla_model as xm

...

device = xm.xla_device()

train_model = TrainModel(classifier, device)
classifier.to(device)

'Hello World' with Tensorflow

Below we will focus only on the minor differences between using MLpug with PyTorch and Tensorflow.

To use MLpug with Tensorflow

import mlpug.tensorflow as mlp

The only real difference is that, for Tensorflow, you can specify if the trainer needs to run in eager mode or not. If not, you need to specify the input batch_data_signature.

trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
                                      model_components=classifier,
                                      eager_mode=True)

trainer = mlp.trainers.DefaultTrainer(optimizers=optimizer,
                                      model_components=classifier,
                                      batch_data_signature=(tf.TensorSpec(shape=(None, 28, 28), dtype=tf.float64),
                                                            tf.TensorSpec(shape=(None,), dtype=tf.uint8),))

When you run tensorflow/hello_world.py and tensorflow/hello_world_not_eager.py you will see that when not running in eager mode, training is much faster.

Running tensorflow/hello_world.py finishes like this:

###############################################################################
Epoch 9/9	READY - Duration 0:00:15
Moving average:
training       : loss          0.229.

Computed over dataset:
validation     : loss          0.370.



INFO    : TrainingManager::_train : Training completed. All good! ❤️

Using the classifier ...
real label = 9, predicted label = 9

Running tensorflow/hello_world_not_eager.py finishes like this:

###############################################################################
Epoch 9/9	READY - Duration 0:00:06
Moving average:
training       : loss          0.229.

Computed over dataset:
validation     : loss          0.370.



INFO    : TrainingManager::_train : Training completed. All good! ❤️

Using the classifier ...
real label = 9, predicted label = 9

Note the difference in epoch duration!

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3.7

Release history Release notifications | RSS feed

0.2.0

Feb 5, 2026

0.1.0

Mar 4, 2024

0.0.57

Nov 21, 2022

0.0.56

Aug 26, 2022

0.0.55

Jul 26, 2022

0.0.54

Jun 16, 2022

0.0.53

Jun 10, 2022

0.0.52

Mar 25, 2022

0.0.51

Mar 25, 2022

0.0.50

Dec 13, 2021

0.0.49

Dec 11, 2021

This version

0.0.48

Dec 10, 2021

0.0.47

Dec 6, 2021

0.0.46

Nov 26, 2021

0.0.45

Nov 24, 2021

0.0.44

Nov 14, 2021

0.0.43

Nov 13, 2021

0.0.42

Nov 1, 2021

0.0.41

Nov 1, 2021

0.0.40

Jul 17, 2021

0.0.36

Jul 1, 2021

0.0.35

Jun 30, 2021

0.0.34

Jun 30, 2021

0.0.33

Jun 29, 2021

0.0.32

Jun 29, 2021

0.0.31

Jun 29, 2021

0.0.30

Jun 29, 2021

0.0.29

Jun 27, 2021

0.0.28

Jun 27, 2021

0.0.27

Mar 6, 2021

0.0.26

Jan 24, 2021

0.0.25

Jan 24, 2021

0.0.23

Jan 23, 2021

0.0.22

Jan 23, 2021

0.0.21

Jan 22, 2021

0.0.20

Jan 21, 2021

0.0.19

Nov 9, 2020

0.0.18

Oct 29, 2020

0.0.17

Oct 22, 2020

0.0.16

Oct 20, 2020

0.0.15

Oct 10, 2020

0.0.14

Jun 28, 2020

0.0.13

May 22, 2020

0.0.12

May 17, 2020

0.0.11

May 15, 2020

0.0.10

May 3, 2020

0.0.9

Apr 29, 2020

0.0.8rc9 pre-release

Apr 17, 2020

0.0.8rc8 pre-release

Apr 15, 2020

0.0.8rc7 pre-release

Apr 14, 2020

0.0.8rc6 pre-release

Apr 9, 2020

0.0.8rc5 pre-release

Apr 7, 2020

0.0.8rc4 pre-release

Mar 26, 2020

0.0.8rc3 pre-release

Mar 24, 2020

0.0.8rc2 pre-release

Mar 23, 2020

0.0.8rc1 pre-release

Jan 13, 2020

0.0.8rc0 pre-release

Jan 12, 2020

0.0.7

Jan 7, 2020

0.0.7rc2 pre-release

Jan 11, 2020

0.0.7rc1 pre-release

Jan 10, 2020

0.0.7rc0 pre-release

Jan 9, 2020

0.0.7b0 pre-release

Jan 9, 2020

0.0.7a0 pre-release

Jan 8, 2020

0.0.6

Jan 2, 2020

0.0.5

Dec 29, 2019

0.0.4

Dec 28, 2019

0.0.3

Dec 28, 2019

0.0.2

Dec 28, 2019

0.0.1

Dec 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlpug-0.0.48.tar.gz (7.4 MB view details)

Uploaded Dec 10, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlpug-0.0.48-py3-none-any.whl (174.4 kB view details)

Uploaded Dec 10, 2021 Python 3

File details

Details for the file mlpug-0.0.48.tar.gz.

File metadata

Download URL: mlpug-0.0.48.tar.gz
Upload date: Dec 10, 2021
Size: 7.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.12

File hashes

Hashes for mlpug-0.0.48.tar.gz
Algorithm	Hash digest
SHA256	`3b64ec5918ba4fb4d26dc2e135bbe495cfd16d4d3c09bda5493997f999d03334`
MD5	`389c8f0722d323287a42b47205084f95`
BLAKE2b-256	`6489d91727c333ad73c03b0e49d8a6bed5fcb74a61b79c4cd7f8500c60933fab`

See more details on using hashes here.

File details

Details for the file mlpug-0.0.48-py3-none-any.whl.

File metadata

Download URL: mlpug-0.0.48-py3-none-any.whl
Upload date: Dec 10, 2021
Size: 174.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.12

File hashes

Hashes for mlpug-0.0.48-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e6d249a748b5a3134c193986bc55ad09b4d17f7dcf99609292fe5f13f4ec337`
MD5	`f17c98f3bb2fa8702ed9c55aa590c4cf`
BLAKE2b-256	`11c8bc07692ac00b753f14b615f3a2704f04f1727c1bc72b0c28cb3758a0b2ad`

See more details on using hashes here.

mlpug 0.0.48

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLpug

Supported backends

MLpug focus

Almost at version 0.1!

Contents

Installing MLpug

Usage with PyTorch

Usage with Tensorflow

Hello World!

'Hello World' with PyTorch

'Hello World' with PyTorch/XLA

'Hello World' with Tensorflow

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes