A novel and simple framework based on prevalent DL frameworks and other image processing libs. v0.4.18: add prep_fn argument in Tank.
Project description
Nebulae Brochure
A novel and simple framework based on concurrent mainstream frameworks and other image processing libraries. It is convenient to deploy almost every module independently.
Modules Overview
Fuel: easily manage and read dataset you need anytime
Toolkit: includes many utilities for better support of nebulae
Fuel
FuelGenerator()
Build a FuelGenerator to spatial efficently store data.
-
config: [dict] A dictionary containing all parameters.
-
file_dir: [str] Where your raw data is.
-
file_list: [str] A csv file in which all the raw datum file name and labels are listed.
-
dtype: [list of str] A list of data types of all columns but the first one in file_list. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed to be saved in variable length.
-
is_seq: [bool] If it is data sequence e.g. video frames. Defaults to false.
An example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to file_dir.
image | label |
---|---|
img_1.jpg | 2 |
img_2.jpg | 0 |
... | ... |
img_100.jpg | 5 |
FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)
- dst_path: [str] A hdf5/npz file where you want to save the data.
- height: [int] range between (0, +∞). The height of image data.
- width: [int] range between (0, +∞). The height of image data.
- channel: [int] The height of image data. Defaults to 3.
- encode: [str] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.
- shards: [int] How many files you need to split the data into. Defaults to 1.
- keep_exif: [bool] Whether to keep EXIF information of photos. Defaults to true.
import nebulae
# create a data generator
fg = nebulae.fuel.Generator(file_dir='/home/file_dir',
file_list='file_list.csv',
dtype=['vuint8', 'int8'])
# generate compressed data file
fg.generate(dst_path='/home/data/fuel.hdf5',
channel=3,
height=224,
width=224)
FuelGenerator.modify(config=None)
You can edit properties again for generating other file.
fg.modify(height=200, width=200)
Passing a dictionary of changed parameters is equivalent.
config = {'height': 200, 'width': 200}
fg.modify(config=config)
FuelDepot()
Build a Fuel Depot that allows you to deposit datasets.
import nebulae
# create a data depot
fd = nebulae.fuel.FuelDepot()
FuelDepot.load(config, name, batch_size, data_path, data_key, height=0, width=0, channel, frame, is_encoded=True, if_shuffle=True, rescale=True, resol_ratio=1, complete_last_batch=True, spatial_aug='', p_sa=(0), theta_sa=(0), temporal_aug='', p_ta=(0), theta_ta=(0))
Mount dataset on your FuelDepot.
- name: [str] Name of your dataset.
- batch_size: [int] The size of a mini-batch.
- data_path: [str] The full path of your data file. It must be a hdf5/npz file.
- data_key: [str] The key name of data.
- if_shuffle: [bool] Whether to shuffle data samples every epoch. Defaults to True.
- is_encoded: [bool] If the stored data has been compressed. Defaults to True.
- channel: [int] The height of image data. Defaults to 3.
- height: [int] range between (0, +∞). Height of image data. Defaults to 0.
- width: [int] range between (0, +∞). Width of image data. Defaults to 0.
- frame: [int] range between [-1, +∞). The unified number of frames for sequential data. Defaults to 0.
- rescale: [bool] Whether to rescale values of fetched data to [-1, 1]. Default to True.
- resol_ratio: [float] range between (0, 1] The coefficient of subsampling for lowering image data resolution. Set it as 0.5 to carry out 1/2 subsampling. Defaults to 1.
- complete_last_batch: [bool] Whether to complete the last batch so that it has samples as many as other batches. Defaults to True.
- spatial_aug: [comma-separated str] Put spatial data augmentations you want in a string with comma as separator. Valid augmentations include 'flip', 'brightness', 'gamma_contrast' and 'log_contrast', e.g. 'flip,brightness'. Defaults to '' which means no augmentation.
- p_sa: [tuple of float] range between [0, 1]. The probabilities of taking spatial data augmentations according to the order in spatial_aug. Defaults to (0).
- theta_sa: [tuple] The parameters of spatial data augmentations according to the order in spatial_aug. Defaults to (0).
- temporal_aug: [comma-separated str] Put temporal data augmentations you want in a string with comma as separator. Valid augmentations include 'sample', e.g. 'sample'. Make sure to set is_seq as True if you want to enable temporal augmentation. Defaults to '' which means no augmentation.
- p_ta: [tuple of float] range between [0, 1]. The probabilities of taking temporal data augmentations according to the order in temporal_aug. Defaults to (0).
- theta_ta: [tuple] The parameters of temporal data augmentations according to the order in temporal_aug. Defaults to (0).
All data augmentation approaches are listed as follows:
Data Source | Augmentation | Parameters |
---|---|---|
Image | flip | empty tuple: () |
crop | nested tuple of float: ((minimum area ratio, maximum area ratio), (minimum aspect ratio, maximum aspect ratio)) of cropped area, where aspect ratio is width/height | |
brightness | float, range between (0, 1]: increment/decrement factor on brightness | |
gamma_contrast | float, range between (0, 1]: expansion/shrinkage factor on pixel value domain | |
log_contrast | float, range between (0, 1]: expansion/shrinkage factor on pixel value domain | |
Sequence | sampling | positive int, denoted as theta: sample an image every theta frames |
fd.load(name='test-img',
batch_size=4,
data_key='image',
data_path='/home/image.hdf5',
width=200, height=200,
resol_ratio=0.5,
spatial_aug='brightness,gamma_contrast',
p_sa=(0.5, 0.5), theta_sa=(0.2, 1.2))
FuelDepot.modify(tank, config=None)
- tank: [str] Specify the dataset to modify.
You can edit properties to change the way you fetch batch and process data.
fd.modify(tank='test-img', name='test', batch_size=2)
Passing a dictionary of changed parameters is equivalent.
config = {'name':'test', 'batch_size':2}
fd.modify(tank='test-img', config=config)
FuelDepot.unload(tank='')
- tank: [str] Specify the dataset to unmount. Defaults to '' in which case all datasets are going to get unmounted.
Unmount dataset that is no longer necessary.
FuelDepot.next(tank)
- tank: [str] Specify the dataset from which data is fetched.
Return a dictionary containing a batch of data, labels and other information.
FuelDepot.epoch
Attribute: a dictionary containing current epoch of each dataset. Epoch starts from 1.
FuelDepot.MPE
Attribute: a dictionary containing how many iterations there are within an epoch for each dataset.
FuelDepot.volume
Attribute: a dictionary containing the number of datum in each dataset.
Astrobase
Component()
Build a component house in which users can make use of varieties of components and create new one by packing some of them up, or just from nothing.
OffTheShelf()
Set up a framework within which users can build modules using core backend. It is convenient especially when you want to fork open-sourced codes into nebulae or when you find it difficult to implement a desired function.
import nebulae
import torch
# designate pytorch as core backend
nebulae.Law.CORE = 'pytorch'
# set up a framework
OTS = nebulae.astrobase.OffTheShelf()
# create your own component
class DecisionLayer(OTS):
def __init__(self, feat_dim, nclass, **kwargs):
super(DecisionLayer, self).__init__(**kwargs)
self.feat_dim = feat_dim
self.linear = torch.nn.Linear(feat_dim, nclass)
def run(self, x):
x = x.reshape(-1, self.feat_dim)
y = self.linear(x)
return y
COMP = nebulae.astrobase.Component()
# add DecisionLayer to component house
COMP.new('dsl', DecisionLayer, 'x', out_shape=(-1, 128))
N.B. Make sure that '_' is not the initial or rear letter of your argument names.
SpaceDock()
Attribute: a dictionary containing the number of datum in each dataset.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.