Skip to main content

A novel and simple framework based on prevalent DL frameworks and other image processing libs. v0.4.19: mute redundant INFO in multi-process.

Project description

Nebulae Brochure

A novel and simple framework based on concurrent mainstream frameworks and other image processing libraries. It is convenient to deploy almost every module independently.


Modules Overview

Fuel: easily manage and read dataset you need anytime

Toolkit: includes many utilities for better support of nebulae


Fuel

FuelGenerator()

Build a FuelGenerator to spatial efficently store data.

  • config: [dict] A dictionary containing all parameters.

  • file_dir: [str] Where your raw data is.

  • file_list: [str] A csv file in which all the raw datum file name and labels are listed.

  • dtype: [list of str] A list of data types of all columns but the first one in file_list. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed to be saved in variable length.

  • is_seq: [bool] If it is data sequence e.g. video frames. Defaults to false.

An example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to file_dir.

image label
img_1.jpg 2
img_2.jpg 0
... ...
img_100.jpg 5

FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)

  • dst_path: [str] A hdf5/npz file where you want to save the data.
  • height: [int] range between (0, +∞). The height of image data.
  • width: [int] range between (0, +∞). The height of image data.
  • channel: [int] The height of image data. Defaults to 3.
  • encode: [str] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.
  • shards: [int] How many files you need to split the data into. Defaults to 1.
  • keep_exif: [bool] Whether to keep EXIF information of photos. Defaults to true.
import nebulae
# create a data generator
fg = nebulae.fuel.Generator(file_dir='/home/file_dir',
                            file_list='file_list.csv',
                            dtype=['vuint8', 'int8'])
# generate compressed data file
fg.generate(dst_path='/home/data/fuel.hdf5', 
            channel=3,
            height=224,
            width=224)

FuelGenerator.modify(config=None)

You can edit properties again for generating other file.

fg.modify(height=200, width=200)

Passing a dictionary of changed parameters is equivalent.

config = {'height': 200, 'width': 200}
fg.modify(config=config)

FuelDepot()

Build a Fuel Depot that allows you to deposit datasets.

import nebulae
# create a data depot
fd = nebulae.fuel.FuelDepot()

FuelDepot.load(config, name, batch_size, data_path, data_key, height=0, width=0, channel, frame, is_encoded=True, if_shuffle=True, rescale=True, resol_ratio=1, complete_last_batch=True, spatial_aug='', p_sa=(0), theta_sa=(0), temporal_aug='', p_ta=(0), theta_ta=(0))

Mount dataset on your FuelDepot.

  • name: [str] Name of your dataset.
  • batch_size: [int] The size of a mini-batch.
  • data_path: [str] The full path of your data file. It must be a hdf5/npz file.
  • data_key: [str] The key name of data.
  • if_shuffle: [bool] Whether to shuffle data samples every epoch. Defaults to True.
  • is_encoded: [bool] If the stored data has been compressed. Defaults to True.
  • channel: [int] The height of image data. Defaults to 3.
  • height: [int] range between (0, +∞). Height of image data. Defaults to 0.
  • width: [int] range between (0, +∞). Width of image data. Defaults to 0.
  • frame: [int] range between [-1, +∞). The unified number of frames for sequential data. Defaults to 0.
  • rescale: [bool] Whether to rescale values of fetched data to [-1, 1]. Default to True.
  • resol_ratio: [float] range between (0, 1] The coefficient of subsampling for lowering image data resolution. Set it as 0.5 to carry out 1/2 subsampling. Defaults to 1.
  • complete_last_batch: [bool] Whether to complete the last batch so that it has samples as many as other batches. Defaults to True.
  • spatial_aug: [comma-separated str] Put spatial data augmentations you want in a string with comma as separator. Valid augmentations include 'flip', 'brightness', 'gamma_contrast' and 'log_contrast', e.g. 'flip,brightness'. Defaults to '' which means no augmentation.
  • p_sa: [tuple of float] range between [0, 1]. The probabilities of taking spatial data augmentations according to the order in spatial_aug. Defaults to (0).
  • theta_sa: [tuple] The parameters of spatial data augmentations according to the order in spatial_aug. Defaults to (0).
  • temporal_aug: [comma-separated str] Put temporal data augmentations you want in a string with comma as separator. Valid augmentations include 'sample', e.g. 'sample'. Make sure to set is_seq as True if you want to enable temporal augmentation. Defaults to '' which means no augmentation.
  • p_ta: [tuple of float] range between [0, 1]. The probabilities of taking temporal data augmentations according to the order in temporal_aug. Defaults to (0).
  • theta_ta: [tuple] The parameters of temporal data augmentations according to the order in temporal_aug. Defaults to (0).

All data augmentation approaches are listed as follows:

Data SourceAugmentationParameters
Imageflipempty tuple: ()
cropnested tuple of float: ((minimum area ratio, maximum area ratio), (minimum aspect ratio, maximum aspect ratio)) of cropped area, where aspect ratio is width/height
brightnessfloat, range between (0, 1]: increment/decrement factor on brightness
gamma_contrastfloat, range between (0, 1]: expansion/shrinkage factor on pixel value domain
log_contrastfloat, range between (0, 1]: expansion/shrinkage factor on pixel value domain
Sequencesamplingpositive int, denoted as theta: sample an image every theta frames
fd.load(name='test-img',
        batch_size=4,
        data_key='image',
        data_path='/home/image.hdf5',
        width=200, height=200,
        resol_ratio=0.5,
        spatial_aug='brightness,gamma_contrast',
        p_sa=(0.5, 0.5), theta_sa=(0.2, 1.2))

FuelDepot.modify(tank, config=None)

  • tank: [str] Specify the dataset to modify.

You can edit properties to change the way you fetch batch and process data.

fd.modify(tank='test-img', name='test', batch_size=2)

Passing a dictionary of changed parameters is equivalent.

config = {'name':'test', 'batch_size':2}
fd.modify(tank='test-img', config=config)

FuelDepot.unload(tank='')

  • tank: [str] Specify the dataset to unmount. Defaults to '' in which case all datasets are going to get unmounted.

Unmount dataset that is no longer necessary.

FuelDepot.next(tank)

  • tank: [str] Specify the dataset from which data is fetched.

Return a dictionary containing a batch of data, labels and other information.

FuelDepot.epoch

Attribute: a dictionary containing current epoch of each dataset. Epoch starts from 1.

FuelDepot.MPE

Attribute: a dictionary containing how many iterations there are within an epoch for each dataset.

FuelDepot.volume

Attribute: a dictionary containing the number of datum in each dataset.


Astrobase

Component()

Build a component house in which users can make use of varieties of components and create new one by packing some of them up, or just from nothing.

OffTheShelf()

Set up a framework within which users can build modules using core backend. It is convenient especially when you want to fork open-sourced codes into nebulae or when you find it difficult to implement a desired function.

import nebulae
import torch
# designate pytorch as core backend
nebulae.Law.CORE = 'pytorch'
# set up a framework
OTS = nebulae.astrobase.OffTheShelf()
# create your own component
class DecisionLayer(OTS):
    def __init__(self, feat_dim, nclass, **kwargs):
        super(DecisionLayer, self).__init__(**kwargs)
        self.feat_dim = feat_dim
        self.linear = torch.nn.Linear(feat_dim, nclass)

    def run(self, x):
        x = x.reshape(-1, self.feat_dim)
        y = self.linear(x)
        return y

COMP = nebulae.astrobase.Component()
# add DecisionLayer to component house
COMP.new('dsl', DecisionLayer, 'x', out_shape=(-1, 128))

N.B. Make sure that '_' is not the initial or rear letter of your argument names.

SpaceDock()

Attribute: a dictionary containing the number of datum in each dataset.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nebulae-0.4.19.tar.gz (53.0 kB view hashes)

Uploaded Source

Built Distribution

nebulae-0.4.19-py3-none-any.whl (188.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page