Skip to main content

A novel and simple framework based on prevalent DL frameworks and other image processing libs. v0.6.24: add macro SG, DP, DT in power module standing for single-gpu training, data parallel and distributed training ; add viz_flow function in utility.

Project description

Nebulae Brochure

A deep learning framework based on PyTorch and concurrent image processing libraries. It aims to offer a set of useful tools and functions.


Installation

pip install nebulae

For better development, building from Dockerfile is also available. Modifying the libs version and have nvidia-docker on your machine is recommended.

sudo docker build -t nebulae:std -f Dockerfile.std .
sudo docker run -it --gpus all --ipc=host --ulimit memlock=-1 nebulae:std

The latest version supports PyTorch1.6 and above


Modules Overview

Fuel: easily manage and read dataset you need anytime

Kit: includes many utilities for better support of nebulae


Fuel

FuelGenerator(file_dir, file_list, dtype, is_seq)

Build a FuelGenerator to spatial efficently store data.

  • config: [dict] A dictionary containing all parameters. The other arguments and this are mutually exclusive.

  • file_dir: [str] Where your raw data is.

  • file_list: [str] A csv file in which all the raw datum file name and labels are listed.

  • dtype: [list of str] A list of data types of all columns but the first one in file_list. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed to be saved in variable length.

  • is_seq: [bool] If it is data sequence e.g. video frames. Defaults to false.

An example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to file_dir.

image label
img_1.jpg 2
img_2.jpg 0
... ...
img_100.jpg 5

if is_seq is True, the csv file is supposed to look like the example below (when char-separator is ',' and quoting-char is '"'):

image label
"vid_1_frame_1.jpg,vid_1_frame_2.jpg,...,vid_1_frame_24.jpg" 2
"vid_2_frame_1.jpg,vid_2_frame_2.jpg,...,vid_2_frame_15.jpg" 0
... ...
"vid_100_frame_1.jpg,vid_100_frame_2.jpg,...,vid_100_frame_39.jpg" 5

FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)

  • dst_path: [str] A hdf5/npz file where you want to save the data.
  • height: [int] range between (0, +∞). The height of image data.
  • width: [int] range between (0, +∞). The height of image data.
  • channel: [int] The height of image data. Defaults to 3.
  • encode: [str] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.
  • shards: [int] How many files you need to split the data into. Defaults to 1.
  • keep_exif: [bool] Whether to keep EXIF information of photos. Defaults to true.
import nebulae as neb
# create a data generator
fg = neb.fuel.Generator(file_dir='/home/file_dir',
                        file_list='file_list.csv',
                        dtype=['vuint8', 'int8'])
# generate compressed data file
fg.generate(dst_path='/home/data/fuel.hdf5', 
            channel=3,
            height=224,
            width=224)

FuelGenerator.modify(config=None)

You can edit properties again for generating other file.

fg.modify(height=200, width=200)

Passing a dictionary of changed parameters is equivalent.

config = {'height': 200, 'width': 200}
fg.modify(config=config)

Tank(data_path, data_specf, batch_size, shuffle, in_same_size, fetch_fn, prep_fn, collate_fn)

Build a Fuel Tank that allows you to deposit datasets.

  • data_path: [str] The full path of your data file. It must be a hdf5/npz file.
  • data_specf: [dict] A dictionary containing key-dtype pairs.
  • batch_size: [int] The size of a mini-batch.
  • shuffle: [bool] Whether to shuffle data samples every epoch. Defaults to True.
  • in_same_size: [bool] Whether to ensure the last batch has samples as many as other batches. Defaults to True.
  • fetch_fn: [func] The function which takes a single datum from dataset.
  • prep_fn: [func] The function which preprocesses fetched datum. Defaults to None.
  • collate_fn: [func] The function which concatenates data as a mini-batch. Defaults to None.

E.g.

from nebulae.fuel import depot
# define data-reading functions
def fetcher(data, idx):
  ret = {}
  ret['image'] = data['image'][idx]
  ret['label'] = data['label'][idx].astype('int64')
  return ret

def prep(data):
  # convert to channel-first format
  data['image'] = np.transpose(data['image'], (2, 0, 1)).astype('float32')
  return data

# create a data depot
tk = depot.Tank("/home/dataset.hdf5",
                {'image': 'vunit8', 'label': 'int64'},
                batch_size=128, shuffle=True, 
                fetch_fn=fetcher, prep_fn=prep)

Tank.next()

Return a batch of data, labels and other information.

Tank.MPE

Attribute: how many iterations there are within an epoch for each dataset.

len(Tank)

Attribute: the number of datum in this dataset.

Comburant()

Comburant is a container to pack up all preprocessing methods.

Data SourceAugmentationUsage
Imageflipflip matrix vertically or horizontally
cropcrop matrix randomly with a given area and aspect ratio
rotaterotate matrix randomly within a given range
brightenadjust brightness given an increment/decrement factor
contrastadjust contrast given an expansion/shrinkage factor
saturateadjust saturation given an expansion/shrinkage factor
Sequencesamplingpositive int, denoted as theta: sample an image every theta frames

Aerolog

DashBoard(log_path='./aerolog', window=1, divisor=10, span=30, format=None)

  • log_path: [str] The place where logs will be stored in.
  • window: [int] Window size for moving average.
  • divisor: [int] How many segments the Y axis is to be divided into.
  • span: [int] The length of X axis.
  • format: [dict of tuple] Specify the format in which results will be displayed every step. Four available format types are 'raw', 'percent', 'tailor' and 'inviz'.

DashBoard is a terminal-favored tool for monitoring your training procedure. It prints a dynamically refreshed progress bar with some important metrics. Besides, it shows a real-time changed curve to visualize how well the training and test phases go.

DashBoard.gauge(entry, mile, epoch, mpe, stage, interval=1, duration=None)

  • entry: [dict] K-V pairs to be displayed.
  • mile: [int] The current step.
  • epoch: [int] The current epoch.
  • mpe: [int] The number of steps within an epoch, i. e. Miles Per Epoch
  • stage: [str] The name of current stage.
  • interval: [int] Display results every a fixed number of steps.
  • duration: [float] The elapsed time since last step or epoch

Call this function after a step or epoch to monitor current states.

DashBoard.log(gauge, tachograph, record)

  • gauge: [bool] Whether to draw metrics as a picture.
  • tachograph: [bool] Whether to write down metrics as a csv file.
  • record: [str] Append metrics logged this time after the recording file.

Write the intermediate results into files.


Cockpit

Engine(device, ngpus=1, least_mem=2048, available_gpus=())

  • device: [str] To run the model on which type of devices. You can either choose 'CPU' or 'GPU'.
  • ngpus: [int] The number of GPUs to be taken. This argument doesn't work when device is CPU.
  • least_mem: [int] Only the GPUs with at least this amount of memory left are regarded as available.
  • available_gpus: [tuple of int] Set the available GPUs explicitly. This argument doesn't work when device is CPU. In default case, it is set to empty tuple and Nebulae will detect available GPUs automatically.

An Engine would take care of the devices to be used especially GPU environment. Usually you need to gear it into your Craft.

TimeMachine(ckpt_path, save_path, max_anchors=-1)

  • ckpt_path: [str] The place where checkpoints will be read from.
  • save_path: [str] The place where checkpoints will be stored.
  • max_anchors: [int] The max number of checkpoints to be kept.

Manage checkpoint saving and reading by creating a TimeMachine.

TimeMachine.to(craft, file='', ckpt_scope=None, frozen=False)

  • craft: [Craft] NN model.
  • file: [str] The file name of checkpoint.
  • ckpt_scope: [str] Only the parameters inside the scope can be loaded.
  • frozen: [bool] Frozen model means omitting unmatched part is not allowed.

TimeMachine.drop(craft, file='', save_scope=None)

  • craft: [Craft] NN model.
  • file: [str] The file name of checkpoint.
  • save_scope: [str] Only the parameters inside the scope can be saved.

Kit

GPUtil()

Create a tool for monitoring GPU status. In fact, it is leveraged implicitly when instantiate an Engine.

GPUtil.available(ngpus, least_mem)

  • ngpus: [int] The number of GPUs to be selected
  • least_mem: [int] The least free memory (MiB) a valid GPU should have.

Returns a list of available GPU information including serial number, name and memory. If the available GPUs are not sufficient, it contains the most devices system can offer.

GPUtil.monitor(sec)

  • sec: [int] The monitor logs every a few seconds. Default to 5.

Start monitoring GPUs. Note that setting sec as a small number might cause abnormal statistics. Turn it bigger if there are too many GPUs on your machine.

GPUtil.status()

Stop monitoring and returns GPUs status summary.


Astro

Craft(scope)

  • scope: [str] Name of this craft. It will be the base name of saved model file by default.

Craft is a base class of neural network which is compatible with the backend framework. It is convenient especially when you want to fork open-sourced codes into nebulae or when you find it difficult to implement a desired function.

from nebulae.astro import dock
import torch
# designate pytorch as core backend
nebulae.Law.CORE = 'pytorch'
# create a network using nebulae
class NetNeb(dock.Craft):
    def __init__(self, nclass, scope='NEB'):
        super(Net, self).__init__(scope)
        self.flat = dock.Reshape()
        self.fc = dock.Dense(512, nclass)

    def run(self, x):
        x = self.flat(x, (-1, 512))
        y = self.fc(x)
        return y
# you can create a same network using pytorch functions
class NetTorch(dock.Craft):
    def __init__(self, nclass, scope='TORCH'):
        super(Net, self).__init__(scope)
        self.fc = torch.nn.Linear(512, nclass)

    def run(self, x):
        x = x.view(-1, 512)
        y = self.fc(x)
        return y

Rudder()

Rudder is a context in which all gradients will be computed and backpropogated through variables accordingly.

Prober()

Prober is a context in which all gradients will be computed but not backpropogated.

Nozzle()

Nozzle is a context in which all variables are fixed and no gradient is going to be computed.

coat(datum, as_const)

  • datum: [int/float/array/tensor] input datum.
  • as_const: [bool] whether to make datum as an untrainable tensor. Defaults to True.

The input tensor will be put into current used device. Any legal input is going to be converted to a tensor at first.

shell(datum, as_np)

  • datum: [tensor] input datum.
  • as_np: [bool] whether to make datum as a regular array. Defaults to True.

The input tensor will be taken out from current used device.

autoPad(in_size, kernel, stride, dilation)

  • in_size: [tuple] input size e.g. (h, w) for 2d tensor
  • kernel: [tuple] kernel size
  • stride: [tuple/int] convolution stride
  • dilation: [tuple/int] stride in atrous convolution

Return the padding tuple leading to an output in same size with input when stride is 1. e.g. (left, right, top, bottom, front, back) for 3d tensor

Multiverse(universe, nworld=1)

  • universe: [Craft] NN model.
  • nworld: [int] The number of devices on which model would be distributed.

Multiverse is a parallelism manager that makes it easier to deploy distributed training. The following example code shows what's the difference between regular training and distributed training.

import nebulae as neb
from nebulae.fuel import farm
from nebulae.astro import dock
from nebulae.cockpit import Engine, TimeMachine, GPU

#          ############
def launch(mv=None): ## <====================================[1]
    #      ############
    
    ############
    mv.init() ## <===========================================[2]
    ############
    
    # ----------------------- Aerolog ------------------------ #
    # defined Dashboard

    # ----------------------- Cockpit ------------------------ #
    #                       ##################
    ng = Engine(device=GPU, ngpu=mv.nworld) ## <=============[3]
    #                       ##################
    tm = TimeMachine(save_path="./ckpt",
                     ckpt_path="./ckpt")

    # ------------------------ Fuel -------------------------- #
    # define dataset Tanks

    # ---------------------- Space Dock ---------------------- #
    class Net(dock.Craft):
        def __init__(self, nclass, scope):
            super(Net, self).__init__(scope)
            # define architecutre

        def run(self, x):
            # define forward procedure

    class Train(dock.Craft):
        def __init__(self, net, scope='TRAIN'):
            super(Train, self).__init__(scope)
            self.net = net
            # initialize params

        def run(self, x, z):
            # define training step
            loss = self.net(x, z)
            #########################
            loss = mv.reduce(loss) ## <==================[4]
            #########################

    class Dev(dock.Craft):
        def __init__(self, net, scope='DEVELOP'):
            super(Dev, self).__init__(scope)
            self.net = net
            # initialize params

        def run(self, x, z, idx):
            # define testing step
            loss = self.net(x, z)
            #########################
            loss = mv.reduce(loss) ## <==================[4]
            #########################

    # ----------------------- Launcher ----------------------- #
    net = Net(10, 'cnn')
    net.gear(ng)
    #####################
    net = mv.sync(net) ## <====[5]
    #####################
    train = Train(net)
    dev = Dev(net)

    for epoch in range(10):
        for mile in range(mpe):
            # train and test
            pass

if __name__ == '__main__':
    # ------------------- Global Setting --------------------- #
    if is_distributed:
      	#########################################
      	mv = neb.cockpit.Multiverse(launch, 4) ##
        mv() # ________________________________## <===========[6]
        #####################################
    else:
      launch()

Six changes are marked as above.

[1] leave an argument for receiving Multiverse instance.

[2] initialize Multiverse instance.

[3] pass the number of worlds to Engine.

[4] post-process the returned values.

[5] synchronize models in different worlds.

[6] instantiate a Multiverse and call it.

Multiverse.init()

Initialize the multiverse manager.

Multiverse.scope()

Create a parallel scope.

Multiverse.sync(models, data)

  • models: [Craft] NN model.
  • data: [tuple of Tank] All datasets in use

Synchronize the distributed models and data.

Multiverse.reduce(tensor, aggregate=False)

  • tensor: [tensor] Returned tensor to be reduced.
  • aggregate: [bool] Either to sum all tensors up or average on them. Defaults to False which means taking average.

Collect results returned by all distributed devices.

Multiverse.Executor

A decorator poiting out the main function which is usually charged with training and testing.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nebulae-0.6.24.tar.gz (77.3 kB view details)

Uploaded Source

Built Distribution

nebulae-0.6.24-py3-none-any.whl (88.0 kB view details)

Uploaded Python 3

File details

Details for the file nebulae-0.6.24.tar.gz.

File metadata

  • Download URL: nebulae-0.6.24.tar.gz
  • Upload date:
  • Size: 77.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for nebulae-0.6.24.tar.gz
Algorithm Hash digest
SHA256 341aa5a94eb7ff181a7fce354627aeaed04688b028af1e2b19e3be6747d3d5b7
MD5 eaddc209201098d22966b84bad44ba11
BLAKE2b-256 2455b8fa892d6651243f4bd9b3bf8d1ac5dee102d6708ece2c2d5619df6d7885

See more details on using hashes here.

File details

Details for the file nebulae-0.6.24-py3-none-any.whl.

File metadata

  • Download URL: nebulae-0.6.24-py3-none-any.whl
  • Upload date:
  • Size: 88.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for nebulae-0.6.24-py3-none-any.whl
Algorithm Hash digest
SHA256 e4c288ce7b3ca5fa30afa9f37ee065d536283b1abfc059fc24e468e2c73639e0
MD5 98bce9ee18925a632833bda9af14a1a0
BLAKE2b-256 ba02f9c39facfe56a1b890188f34f5164e3c3bb7cf93773411cec264405a14b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page