A novel and simple framework based on prevalent DL frameworks and other image processing libs. v0.6.21: fix a bug of version checkers; add a new argument for Timer to join threads.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Nebulae Brochure

A deep learning framework based on PyTorch and concurrent image processing libraries. It aims to offer a set of useful tools and functions.

Installation

pip install nebulae

For better development, building from Dockerfile is also available. Modifying the libs version and have nvidia-docker on your machine is recommended.

sudo docker build -t nebulae:std -f Dockerfile.std .
sudo docker run -it --gpus all --ipc=host --ulimit memlock=-1 nebulae:std

The latest version supports PyTorch1.6 and above

Modules Overview

Fuel: easily manage and read dataset you need anytime

Kit: includes many utilities for better support of nebulae

Fuel

FuelGenerator(file_dir, file_list, dtype, is_seq)

Build a FuelGenerator to spatial efficently store data.

config: [dict] A dictionary containing all parameters. The other arguments and this are mutually exclusive.
file_dir: [str] Where your raw data is.
file_list: [str] A csv file in which all the raw datum file name and labels are listed.
dtype: [list of str] A list of data types of all columns but the first one in file_list. Valid data types are 'uint8', 'uint16', 'uint32', 'int8', 'int16', 'int32', 'int64', 'float16', 'float32', 'float64', 'str'. Plus, if you add a 'v' as initial character e.g. 'vuint8', the data of each row in this column is allowed to be saved in variable length.
is_seq: [bool] If it is data sequence e.g. video frames. Defaults to false.

An example of file_list.csv is as follow. 'image' and 'label' are the key names of data and labels respectively. Note that the image name is a path relative to file_dir.

image	label
img_1.jpg	2
img_2.jpg	0
...	...
img_100.jpg	5

if is_seq is True, the csv file is supposed to look like the example below (when char-separator is ',' and quoting-char is '"'):

image	label
"vid_1_frame_1.jpg,vid_1_frame_2.jpg,...,vid_1_frame_24.jpg"	2
"vid_2_frame_1.jpg,vid_2_frame_2.jpg,...,vid_2_frame_15.jpg"	0
...	...
"vid_100_frame_1.jpg,vid_100_frame_2.jpg,...,vid_100_frame_39.jpg"	5

FuelGenerator.generate(dst_path, height, width, channel=3, encode='JPEG', shards=1, keep_exif=True)

dst_path: [str] A hdf5/npz file where you want to save the data.
height: [int] range between (0, +∞). The height of image data.
width: [int] range between (0, +∞). The height of image data.
channel: [int] The height of image data. Defaults to 3.
encode: [str] The mean by which image data is encoded. Valid encoders are 'jpeg' and 'png'. 'PNG' is the way without information loss. Defaults to 'JPEG'.
shards: [int] How many files you need to split the data into. Defaults to 1.
keep_exif: [bool] Whether to keep EXIF information of photos. Defaults to true.

import nebulae as neb
# create a data generator
fg = neb.fuel.Generator(file_dir='/home/file_dir',
                        file_list='file_list.csv',
                        dtype=['vuint8', 'int8'])
# generate compressed data file
fg.generate(dst_path='/home/data/fuel.hdf5', 
            channel=3,
            height=224,
            width=224)

FuelGenerator.modify(config=None)

You can edit properties again for generating other file.

fg.modify(height=200, width=200)

Passing a dictionary of changed parameters is equivalent.

config = {'height': 200, 'width': 200}
fg.modify(config=config)

Tank(data_path, data_specf, batch_size, shuffle, in_same_size, fetch_fn, prep_fn, collate_fn)

Build a Fuel Tank that allows you to deposit datasets.

data_path: [str] The full path of your data file. It must be a hdf5/npz file.
data_specf: [dict] A dictionary containing key-dtype pairs.
batch_size: [int] The size of a mini-batch.
shuffle: [bool] Whether to shuffle data samples every epoch. Defaults to True.
in_same_size: [bool] Whether to ensure the last batch has samples as many as other batches. Defaults to True.
fetch_fn: [func] The function which takes a single datum from dataset.
prep_fn: [func] The function which preprocesses fetched datum. Defaults to None.
collate_fn: [func] The function which concatenates data as a mini-batch. Defaults to None.

E.g.

from nebulae.fuel import depot
# define data-reading functions
def fetcher(data, idx):
  ret = {}
  ret['image'] = data['image'][idx]
  ret['label'] = data['label'][idx].astype('int64')
  return ret

def prep(data):
  # convert to channel-first format
  data['image'] = np.transpose(data['image'], (2, 0, 1)).astype('float32')
  return data

# create a data depot
tk = depot.Tank("/home/dataset.hdf5",
                {'image': 'vunit8', 'label': 'int64'},
                batch_size=128, shuffle=True, 
                fetch_fn=fetcher, prep_fn=prep)

Tank.next()

Return a batch of data, labels and other information.

Tank.MPE

Attribute: how many iterations there are within an epoch for each dataset.

len(Tank)

Attribute: the number of datum in this dataset.

Comburant()

Comburant is a container to pack up all preprocessing methods.

Data Source	Augmentation	Usage
Image	flip	flip matrix vertically or horizontally
	crop	crop matrix randomly with a given area and aspect ratio
	rotate	rotate matrix randomly within a given range
	brighten	adjust brightness given an increment/decrement factor
	contrast	adjust contrast given an expansion/shrinkage factor
	saturate	adjust saturation given an expansion/shrinkage factor
Sequence	sampling	positive int, denoted as theta: sample an image every theta frames

Aerolog

DashBoard(log_path='./aerolog', window=1, divisor=10, span=30, format=None)

log_path: [str] The place where logs will be stored in.
window: [int] Window size for moving average.
divisor: [int] How many segments the Y axis is to be divided into.
span: [int] The length of X axis.
format: [dict of tuple] Specify the format in which results will be displayed every step. Four available format types are 'raw', 'percent', 'tailor' and 'inviz'.

DashBoard is a terminal-favored tool for monitoring your training procedure. It prints a dynamically refreshed progress bar with some important metrics. Besides, it shows a real-time changed curve to visualize how well the training and test phases go.

DashBoard.gauge(entry, mile, epoch, mpe, stage, interval=1, duration=None)

entry: [dict] K-V pairs to be displayed.
mile: [int] The current step.
epoch: [int] The current epoch.
mpe: [int] The number of steps within an epoch, i. e. Miles Per Epoch
stage: [str] The name of current stage.
interval: [int] Display results every a fixed number of steps.
duration: [float] The elapsed time since last step or epoch

Call this function after a step or epoch to monitor current states.

DashBoard.log(gauge, tachograph, record)

gauge: [bool] Whether to draw metrics as a picture.
tachograph: [bool] Whether to write down metrics as a csv file.
record: [str] Append metrics logged this time after the recording file.

Write the intermediate results into files.

Cockpit

Engine(device, ngpus=1, least_mem=2048, available_gpus=())

device: [str] To run the model on which type of devices. You can either choose 'CPU' or 'GPU'.
ngpus: [int] The number of GPUs to be taken. This argument doesn't work when device is CPU.
least_mem: [int] Only the GPUs with at least this amount of memory left are regarded as available.
available_gpus: [tuple of int] Set the available GPUs explicitly. This argument doesn't work when device is CPU. In default case, it is set to empty tuple and Nebulae will detect available GPUs automatically.

An Engine would take care of the devices to be used especially GPU environment. Usually you need to gear it into your Craft.

TimeMachine(ckpt_path, save_path, max_anchors=-1)

ckpt_path: [str] The place where checkpoints will be read from.
save_path: [str] The place where checkpoints will be stored.
max_anchors: [int] The max number of checkpoints to be kept.

Manage checkpoint saving and reading by creating a TimeMachine.

TimeMachine.to(craft, file='', ckpt_scope=None, frozen=False)

craft: [Craft] NN model.
file: [str] The file name of checkpoint.
ckpt_scope: [str] Only the parameters inside the scope can be loaded.
frozen: [bool] Frozen model means omitting unmatched part is not allowed.

TimeMachine.drop(craft, file='', save_scope=None)

craft: [Craft] NN model.
file: [str] The file name of checkpoint.
save_scope: [str] Only the parameters inside the scope can be saved.

Kit

GPUtil()

Create a tool for monitoring GPU status. In fact, it is leveraged implicitly when instantiate an Engine.

GPUtil.available(ngpus, least_mem)

ngpus: [int] The number of GPUs to be selected
least_mem: [int] The least free memory (MiB) a valid GPU should have.

Returns a list of available GPU information including serial number, name and memory. If the available GPUs are not sufficient, it contains the most devices system can offer.

GPUtil.monitor(sec)

sec: [int] The monitor logs every a few seconds. Default to 5.

Start monitoring GPUs. Note that setting sec as a small number might cause abnormal statistics. Turn it bigger if there are too many GPUs on your machine.

GPUtil.status()

Stop monitoring and returns GPUs status summary.

Astro

Craft(scope)

scope: [str] Name of this craft. It will be the base name of saved model file by default.

Craft is a base class of neural network which is compatible with the backend framework. It is convenient especially when you want to fork open-sourced codes into nebulae or when you find it difficult to implement a desired function.

from nebulae.astro import dock
import torch
# designate pytorch as core backend
nebulae.Law.CORE = 'pytorch'
# create a network using nebulae
class NetNeb(dock.Craft):
    def __init__(self, nclass, scope='NEB'):
        super(Net, self).__init__(scope)
        self.flat = dock.Reshape()
        self.fc = dock.Dense(512, nclass)

    def run(self, x):
        x = self.flat(x, (-1, 512))
        y = self.fc(x)
        return y
# you can create a same network using pytorch functions
class NetTorch(dock.Craft):
    def __init__(self, nclass, scope='TORCH'):
        super(Net, self).__init__(scope)
        self.fc = torch.nn.Linear(512, nclass)

    def run(self, x):
        x = x.view(-1, 512)
        y = self.fc(x)
        return y

Rudder()

Rudder is a context in which all gradients will be computed and backpropogated through variables accordingly.

Prober()

Prober is a context in which all gradients will be computed but not backpropogated.

Nozzle()

Nozzle is a context in which all variables are fixed and no gradient is going to be computed.

coat(datum, as_const)

datum: [int/float/array/tensor] input datum.
as_const: [bool] whether to make datum as an untrainable tensor. Defaults to True.

The input tensor will be put into current used device. Any legal input is going to be converted to a tensor at first.

shell(datum, as_np)

datum: [tensor] input datum.
as_np: [bool] whether to make datum as a regular array. Defaults to True.

The input tensor will be taken out from current used device.

autoPad(in_size, kernel, stride, dilation)

in_size: [tuple] input size e.g. (h, w) for 2d tensor
kernel: [tuple] kernel size
stride: [tuple/int] convolution stride
dilation: [tuple/int] stride in atrous convolution

Return the padding tuple leading to an output in same size with input when stride is 1. e.g. (left, right, top, bottom, front, back) for 3d tensor

Multiverse(universe, nworld=1)

universe: [Craft] NN model.
nworld: [int] The number of devices on which model would be distributed.

Multiverse is a parallelism manager that makes it easier to deploy distributed training. The following example code shows what's the difference between regular training and distributed training.

import nebulae as neb
from nebulae.fuel import farm
from nebulae.astro import dock
from nebulae.cockpit import Engine, TimeMachine, GPU

#          ############
def launch(mv=None): ## <====================================[1]
    #      ############
    
    ############
    mv.init() ## <===========================================[2]
    ############
    
    # ----------------------- Aerolog ------------------------ #
    # defined Dashboard

    # ----------------------- Cockpit ------------------------ #
    #                       ##################
    ng = Engine(device=GPU, ngpu=mv.nworld) ## <=============[3]
    #                       ##################
    tm = TimeMachine(save_path="./ckpt",
                     ckpt_path="./ckpt")

    # ------------------------ Fuel -------------------------- #
    # define dataset Tanks

    # ---------------------- Space Dock ---------------------- #
    class Net(dock.Craft):
        def __init__(self, nclass, scope):
            super(Net, self).__init__(scope)
            # define architecutre

        def run(self, x):
            # define forward procedure

    class Train(dock.Craft):
        def __init__(self, net, scope='TRAIN'):
            super(Train, self).__init__(scope)
            self.net = net
            # initialize params

        def run(self, x, z):
            # define training step
            loss = self.net(x, z)
            #########################
            loss = mv.reduce(loss) ## <==================[4]
            #########################

    class Dev(dock.Craft):
        def __init__(self, net, scope='DEVELOP'):
            super(Dev, self).__init__(scope)
            self.net = net
            # initialize params

        def run(self, x, z, idx):
            # define testing step
            loss = self.net(x, z)
            #########################
            loss = mv.reduce(loss) ## <==================[4]
            #########################

    # ----------------------- Launcher ----------------------- #
    net = Net(10, 'cnn')
    net.gear(ng)
    #####################
    net = mv.sync(net) ## <====[5]
    #####################
    train = Train(net)
    dev = Dev(net)

    for epoch in range(10):
        for mile in range(mpe):
            # train and test
            pass

if __name__ == '__main__':
    # ------------------- Global Setting --------------------- #
    if is_distributed:
      	#########################################
      	mv = neb.cockpit.Multiverse(launch, 4) ##
        mv() # ________________________________## <===========[6]
        #####################################
    else:
      launch()

Six changes are marked as above.

[1] leave an argument for receiving Multiverse instance.

[2] initialize Multiverse instance.

[3] pass the number of worlds to Engine.

[4] post-process the returned values.

[5] synchronize models in different worlds.

[6] instantiate a Multiverse and call it.

Multiverse.init()

Initialize the multiverse manager.

Multiverse.scope()

Create a parallel scope.

Multiverse.sync(models, data)

models: [Craft] NN model.
data: [tuple of Tank] All datasets in use

Synchronize the distributed models and data.

Multiverse.reduce(tensor, aggregate=False)

tensor: [tensor] Returned tensor to be reduced.
aggregate: [bool] Either to sum all tensors up or average on them. Defaults to False which means taking average.

Collect results returned by all distributed devices.

Multiverse.Executor

A decorator poiting out the main function which is usually charged with training and testing.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.6.21

Sep 18, 2023

0.6.20

Aug 20, 2023

0.6.19

Aug 6, 2023

0.6.18

Aug 3, 2023

0.6.17

Jul 30, 2023

0.6.16

Jul 13, 2023

0.6.15

Jul 5, 2023

0.6.14

Jun 24, 2023

0.6.13

May 24, 2023

0.6.12

May 17, 2023

0.6.11

May 16, 2023

0.6.10

May 15, 2023

0.6.9

May 14, 2023

0.6.7

May 4, 2023

0.6.6

Apr 29, 2023

0.6.5

Apr 29, 2023

0.6.4

Feb 28, 2023

0.6.3

Jan 17, 2023

0.6.2

Dec 17, 2022

0.6.1

Sep 9, 2022

0.6.0

Sep 5, 2022

0.5.38

Jul 22, 2022

0.5.37

Jul 17, 2022

0.5.36

Jun 17, 2022

0.5.35

Jun 10, 2022

0.5.34

Jun 7, 2022

0.5.33

Jun 7, 2022

0.5.32

Jun 7, 2022

0.5.31

Jun 6, 2022

0.5.30

May 21, 2022

0.5.29

May 19, 2022

0.5.28

May 17, 2022

0.5.27

May 16, 2022

0.5.26

May 10, 2022

0.5.25

Apr 6, 2022

0.5.24

Apr 4, 2022

0.5.23

Mar 14, 2022

0.5.22

Mar 11, 2022

0.5.21

Mar 10, 2022

0.5.20

Mar 10, 2022

0.5.19

Mar 9, 2022

0.5.18

Mar 8, 2022

0.5.17

Feb 25, 2022

0.5.16

Feb 21, 2022

0.5.15

Jan 29, 2022

0.5.14

Jan 24, 2022

0.5.13

Dec 26, 2021

0.5.12

Dec 19, 2021

0.5.11

Dec 16, 2021

0.5.10

Oct 20, 2021

0.5.9

Sep 25, 2021

0.5.8

Sep 16, 2021

0.5.7

Sep 15, 2021

0.5.6

Sep 14, 2021

0.5.5

Aug 10, 2021

0.5.4

Aug 2, 2021

0.5.2

May 27, 2021

0.5.1

Feb 9, 2021

0.5.0

Jan 27, 2021

0.4.20

Jan 27, 2021

0.4.19

Jan 26, 2021

0.4.18

Jan 22, 2021

0.4.17

Jan 22, 2021

0.4.15

Dec 14, 2020

0.4.14

Dec 8, 2020

0.4.13

Dec 2, 2020

0.4.12

Nov 30, 2020

0.4.11

Nov 28, 2020

0.4.9

Nov 25, 2020

0.4.8

Nov 25, 2020

0.4.7

Nov 23, 2020

0.4.6

Nov 23, 2020

0.4.5

Nov 18, 2020

0.4.4

Nov 17, 2020

0.4.3

Oct 26, 2020

0.4.1

Oct 22, 2020

0.4.0

Oct 21, 2020

0.3.2

Jul 21, 2020

0.3.1

Feb 29, 2020

0.3.0

Nov 20, 2019

0.2.5

Oct 28, 2019

0.2.4

Oct 24, 2019

0.2.3

Oct 24, 2019

0.2.2

Oct 22, 2019

0.2.1

Oct 22, 2019

0.2.0

Oct 21, 2019

0.1.21

Mar 28, 2019

0.1.20

Mar 26, 2019

0.1.19

Mar 25, 2019

0.1.18

Mar 22, 2019

0.1.17

Mar 20, 2019

0.1.16

Mar 12, 2019

0.1.15

Mar 12, 2019

0.1.14

Mar 7, 2019

0.1.13

Mar 6, 2019

0.1.12

Feb 19, 2019

0.1.11

Feb 12, 2019

0.1.10

Jan 28, 2019

0.1.9

Jan 15, 2019

0.1.8

Jan 15, 2019

0.1.7

Jan 15, 2019

0.1.6

Jan 14, 2019

0.1.5

Jan 8, 2019

0.1.4

Jan 8, 2019

0.1.1

Jan 7, 2019

0.1.0

Jan 6, 2019

0.0.17

Dec 25, 2018

0.0.16

Dec 21, 2018

0.0.14

Dec 20, 2018

0.0.13

Dec 20, 2018

0.0.12

Dec 16, 2018

0.0.10

Dec 14, 2018

0.0.4

Dec 11, 2018

0.0.1

Nov 8, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nebulae-0.6.21.tar.gz (76.6 kB view hashes)

Uploaded Sep 18, 2023 Source

Built Distribution

nebulae-0.6.21-py3-none-any.whl (87.9 kB view hashes)

Uploaded Sep 18, 2023 Python 3

Hashes for nebulae-0.6.21.tar.gz

Hashes for nebulae-0.6.21.tar.gz
Algorithm	Hash digest
SHA256	`c37e9049654e5fa9dc4bd48ffda00746be126c5745852df055f2f00172fb237e`
MD5	`7895003a601577e42458b548944aec4a`
BLAKE2b-256	`bc1164fee1f707e0b37dc339e61739ca3181f7ec6891e84a3849fa2e5af9ff38`

Hashes for nebulae-0.6.21-py3-none-any.whl

Hashes for nebulae-0.6.21-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44d54d26a3907c26ef2f832d667d45c3322614a255df94b761c95543cd4f544b`
MD5	`59b82c609600d0183c4582676b7f06b6`
BLAKE2b-256	`24e5f79290dd6945cc8e4629d5a26226d94378a5b3839cbaab458846a2b53bc3`