Skip to main content

A simple library to simplify data handling in deep learning environments. With an API non dissimilar to PyTorch DataLoader

Project description

DataManager

A simple library to simplify data handling in deep learning environments. With an API non dissimilar to PyTorch DataLoader

Usage

    from dataManager import Manager as DataManager

    dataFunction = lambda x: (reduceImageSize(inputImages,x), inputLabels[x], ...)
    # Initialize Data Providers
    dataManger = DataManger(
        data = dataFunction,
        bz = 32,  
        stochasticSampling = True,
        indexingShape = [dataInput.shape[0]] 
    )

    # provide the data for the tenth batch
    i = 10
    dataManger(i, stochastic = False).shape
    # > (32, ...)

API

__init__(data, indexingShape = None, bz = 32, stochasticSampling = True)

  • data: numpy.array or function: Data to be used during training. If a numpy array is provided the data will be wrapped in a function. The input function should only take one argument which are the indexes of the elements for the data batch.

  • indexingShape: array: Array of size 1 or 2. Used to provide the data at each batch. If the array had dim 1, then a list of indexes will be provided. If stochasticSampling is set to false then the indexes will be ordered from 0 to (indexingShape - indexingShape%bz), else if the array is dim 2 the indexes will be provided from left to right and from top to bottom of the 2d matrix, in this case we also only provide as many indexes as is possible to provide with full batches.

  • stochasticSampling: bool: if true, the indexes provided are sampled in a totally random way with equal probability for each element. For each batch all indexes will be different, but uniqueness of indexes is not guaranteed though batches.

__call__(i, stochastic = None)

  • i: integer: index of the current batch, starting from 0
  • stochastic: bool: if stochasticSampling is to be used or not

Notes:

StochasticSampling overwrites ordered indexing.

Undocumented Functions:

StochasticSampling overwrites ordered indexing.

  • self.getBatch(self, step, bz): return an ordered batch of indexes at a given step
  • self.getBatch(self, step, bz): return an ordered batch of indexes at a given step
  • self.getStochasticBatch(self, shape, bz): return a randomly picked set of elements with equal probability. Shape is the specific shape defaults to indexingShape if nothing given.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dataManager, version 0.2.2
Filename, size File type Python version Upload date Hashes
Filename, size dataManager-0.2.2-py3-none-any.whl (3.5 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size dataManager-0.2.2.tar.gz (3.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page