Skip to main content

A deep learning package for using HDF5 and Pytorch (Distributed Data Parallel with NVIDIA mixed-precision) with ease.

Project description

dnns - general purpose package for training deep neural networks with custom HDF5 datasets.

After many iterations of deep learning scripts, I have finally decided to have one package - with data loading tools to do deep learning. For more documentation go to https://cleanit.github.io/dnns.

Some information about the package:

  1. This package utilizes pytorch.
  2. This package can also utilize mixed-precision training using NVIDIA's apex package.
  3. There is an executable titled 'worker.py' that is installed with this package. If you are not interested in going through the process of writing a script that does training (or you have done this already and don't want to do it again), then you can just use it (see steps of how to use it below).
  4. This script can be used for multi-node, multi-gpu training (see below for more information).

Quickstart:

You must have 2 things (1 other thing is recommended).

  1. A network (using the ordinary class structure that pytorch uses) written to a file (default is dnn.py) with a network class defined to be DNN.

  2. A train and test directory with your HDF5 data sitting in it. The default dataset labels that the loader will read are 'X' and 'Y', which represent input and output data.

  3. (Recommended) A YAML file (default name is input.yaml) where you can configure the training protocol.

To run, you can simply run:

python -m torch.distributed.launch -nnodes <n_nodes> --nproc_per_node <n_gpus_per_node> worker.py

Slowstart:

Let's say you are in some directory called some_dir. We type ls and see:

~some_dir $ ls
input.yaml  dnn.py  train/  test/

Let's say you have HDF5 files called training.h5 and testing.h5 located in train/ and test/ with dataset labels input_data and output_data. Our configuration file input.yaml could look something like this:

~some_dir $ more input.yaml
# the number of epochs to run
n_epochs: 2000

# batch size of images
batch_size: 512

# learning rate for model
learning_rate: 0.00001

# number of threads for each GPU to use for data queuing
n_workers: 6 

# labels in the HDF5 files
x_label: 'input_data'
y_label: 'output_data'

mixed_precision: false

Here we have defined the number of epochs, batch size, learning rate, the number of worker threads, dataset labels, and we have turned mixed precision off. Now let's look at dnn.py.

~some_dir $ more dnn.py
import torch
import torch.nn as nn
import numpy as np
from collections import OrderedDict

class DNN(nn.Module):

    def __init__(self, input_shape):
        super().__init__()
        layers = OrderedDict()
        layers['conv_red_1'] = nn.Conv3d(1, 64, 5, padding=2, stride=2)
        layers['conv_red_1_elu'] = nn.ELU()
        layers['conv_red_2'] = nn.Conv3d(64, 64, 5, padding=2, stride=1)
        layers['conv_red_2_elu'] = nn.ELU()

        layers['conv_nonred_3'] = nn.Conv3d(64, 16, 5, padding=2)
        layers['conv_nonred_3_elu'] = nn.ELU()
        for i in range(4, 9):
            layers['conv_nonred_' + str(i)] = nn.Conv3d(16, 16, 5, padding=2)
            layers['conv_nonred_' + str(i) + '_elu'] = nn.ELU()

        layers['conv_red_3'] = nn.Conv3d(16, 64, 5, padding=2, stride=1)
        layers['conv_red_3_elu'] = nn.ELU()

        layers['conv_nonred_9'] = nn.Conv3d(64, 32, 5, padding=2, stride=1)
        layers['conv_red_9_elu'] = nn.ELU()
        for i in range(10, 14):
            layers['conv_nonred_' + str(i)] = nn.Conv3d(32, 32, 5, padding=2)
            layers['conv_nonred_' + str(i) + '_elu'] = nn.ELU()

        layers['flatten'] = nn.Flatten()
        layers['fc1'] = nn.Linear((input_shape[0] //2 + 1) * (input_shape[1] //2 + 1) * (input_shape[2] //2 + 1) * input_shape[3] * 32, 1024 )
        layers['fc1_relu'] = nn.ELU()
        layers['fc2'] = nn.Linear(1024, 1)
        self.model = nn.Sequential(layers)


    def forward(self, x):
        x = x.reshape(x.shape[0], x.shape[-1], x.shape[1], x.shape[2], x.shape[3])  
        return self.model(x)

With these defined you can simply run:

python -m torch.distributed.launch -nnodes <n_nodes> --nproc_per_node <n_gpus_per_node> worker.py

Afterwards a checkpoint file checkpoint.torch, and a data file loss_vs_epoch.dat is created.

Limitations

  1. Currently, you can only have one HDF5 file for training/testing.
  2. There has not been a lot of testing of this repo. I expect there will be critical issues that must be taken care of in future releases.

Contact

Please contact me at kevin.ryczko@uottawa.ca for any issues.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dnns-1.4.7.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dnns-1.4.7-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file dnns-1.4.7.tar.gz.

File metadata

  • Download URL: dnns-1.4.7.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for dnns-1.4.7.tar.gz
Algorithm Hash digest
SHA256 c3b0f56d4a6385e9bbdb0a9c917c819545a102c5bc8d36b4001bdebecc337816
MD5 e86b78cadf72873cdd3878fcf9e97df9
BLAKE2b-256 bec4b36de482e40f57c12beff084f08f85e33968abcd8c3f94ec7139212e8c1e

See more details on using hashes here.

File details

Details for the file dnns-1.4.7-py3-none-any.whl.

File metadata

  • Download URL: dnns-1.4.7-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0.post20200714 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for dnns-1.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 bdd0a728bc243cb95279430cca4ba7f607e8c8cf3ab70c996a666956ab8155f1
MD5 a4964f9a25b875c767d696f3770d2a37
BLAKE2b-256 2a38bb31d83a853f571926387645b1517d7db493471dbebc0fb25b768bb343ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page