Skip to main content

A helper package for hdf5 data handling

Project description

lose

lose, but in particular lose.LOSE(), is a helper class for handling data using hdf5 file format and PyTables

>>> from lose import LOSE
>>> l = LOSE()
>>> l
<lose hdf5 data handler, fname=None, atom=Float32Atom(shape=(), dflt=0.0)>
generator parameters: iterItems=None, iterOutput=None, batch_size=1, limit=None, loopforever=False, shuffle=False

installation

pip3 install -U lose

or

pip install -U lose

structure

vars

LOSE.fname is the path to the .h5 file including the name and extension, default is None.

LOSE.atom recommended to be left at default, is the dtype for the data to be stored in, default is tables.Float32Atom() which results to arrays with dtype==np.float32.

LOSE.generator() related vars:

LOSE.batch_size batch size of data getting pulled from the .h5 file, default is 1.

LOSE.limit limits the amount of data loaded by the generator, default is None, if None all available data will be loaded.

LOSE.loopforever bool that allows infinite looping over the data, default is False.

LOSE.iterItems list of X group names and list of Y group names, default is None, required to be user defined for LOSE.generator() to work.

LOSE.iterOutput list of X output names and list of Y output names for LOSE.iterItems to be mapped to, default is None, required to be user defined for LOSE.generator() to work.

LOSE.shuffle bool that enables shuffling of the data, default is False, shuffling is affected by LOSE.limit and LOSE.batch_size.

methods

Help on LOSE in module lose.dataHandler object:

class LOSE(builtins.object)
 |  Methods defined here:
 |  
 |  __init__(self, fname=None)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  generator(self)
 |  
 |  getShape(self, arrName)
 |  
 |  load(self, *args, batch_obj=':')
 |  
 |  makeGenerator(self, layerNames, limit=None, batch_size=1, shuffle=False, **kwards)
 |  
 |  newGroup(self, fmode='a', **kwards)
 |  
 |  removeGroup(self, *args)
 |  
 |  renameGroup(self, **kwards)
 |  
 |  save(self, **kwards)
 |  
 |  ----------------------------------------------------------------------

LOSE.newGroup(fmode='a', **groupNames) is used to append/write(depends on the fmode keyword argument, default is 'a') group(s) to a .h5 file.

LOSE.removeGroup(*groupNames) is used for to remove group(s) from a file, provided the group(s) name.

LOSE.renameGroup(**groupNames) is used to rename group(s) within a .h5 file, see examples below.

LOSE.save(**groupNamesAndSahpes) is used to save data(in append mode only) to a group(s) into a .h5 file, the data needs to have the same shape as group.shape[1:] the data was passed to, LOSE.get_shape(groupName) can be used to get the group.shape.

LOSE.load(*groupNames) is used to load data(hole group or a slice, to load a slice change LOSE.batch_obj to a string with the desired slice, default is "[:]") from a group, group has to be present in the .h5 file.

LOSE.getShape(groupName) is used to get the shape of a single group, group has to be present in the .h5 file.

LOSE.generator() check LOSE.generator() details section, LOSE.iterItems and LOSE.iterOutput have to be defined.

LOSE.makeGenerator(layerNames, limit=None, batch_size=1, shuffle=False, **data) again check LOSE.generator() details more details.

example usage

creating/adding new group(s) to a file
import numpy as np
from lose import LOSE

l = LOSE()
l.fname = 'path/to/your/save/file.h5' # path to the .h5 file, has to be user defined before any methods can be used, default is None

exampleDataX = np.arange(20, dtype=np.float32)
exampleDataY = np.arange(3, dtype=np.float32)

l.newGroup(fmode='w', x=(0, *exampleDataX.shape), y=(0, *exampleDataY.shape)) # creating new groups(ready for data saved to) in a file, if fmode is 'w' all groups in the file will be overwritten
saving data to a group(s)
import numpy as np
from lose import LOSE

l = LOSE()
l.fname = 'path/to/your/save/file.h5' # path to the .h5 file, has to be user defined before any methods can be used, default is None

exampleDataX = np.arange(20, dtype=np.float32)
exampleDataY = np.arange(3, dtype=np.float32)

l.save(x=[exampleDataX, exampleDataX], y=[exampleDataY, exampleDataY]) # saving data into groups defined in the previous example
l.save(y=[exampleDataY], x=[exampleDataX]) # the same thing
loading data from a group(s) within a file

for this example file has data from the previous example

import numpy as np
from lose import LOSE

l = LOSE()
l.fname = 'path/to/your/save/file.h5' # path to the .h5 file, has to be user defined before any methods can be used, default is None

x, y = l.load('x', 'y') # loading data from the .h5 file(has to be a real file) populated by previous examples
y2compare, x2compare = l.load('y', 'x') # the same thing

print (np.all(x == x2compare), np.all(y == y2compare)) # True True
getting the shape of a group

for this example file has data from previous examples

import numpy as np
from lose import LOSE

l = LOSE()
l.fname = 'path/to/your/save/file.h5' # path to the .h5 file(populated by previous examples), has to be user defined before any methods can be used, default is None

print (l.getShape('x')) # (3, 20)
print (l.getShape('y')) # (3, 3)
renaming group(s) in a file

for this example file has data from previous examples

import numpy as np
from lose import LOSE

l = LOSE('path/to/your/save/file.h5')
x2compare, y2compare = l.load('x', 'y')
print (l) # file structure before renaming any group(s)
l.renameGroup(y='z', x='lol')
lol, z = l.load('lol', 'z')
print (l) # file structure after renaming group(s)
print (np.all(x2compare == lol), np.all(y2compare == z)) # True True
removing group(s) from a file

for this example file has data from previous examples

from lose import LOSE

l = LOSE(fname='path/to/your/save/file.h5')

l.removeGroup('lol', 'z') # removing the group(s)

x = l.load('lol') # now this will result in an error because group 'x' was removed from the file

LOSE.generator() details

LOSE.generator() is a python generator used to access data from a hdf5 file in LOSE.batch_size pieces without loading the hole file/group into memory, also works with tf.keras.model.fit_generator(), have to be used with a with context statement(see examples below).

LOSE.iterItems and LOSE.iterOutput have to be defined by user first.

LOSE.make_generator(layerNames, limit=None, batch_size=1, shuffle=False, **data) has the same rules as LOSE.generator(). however the data needs to be passed to it each time it's initialized, data is only stored temporarily, the parameters are passed to it on initialization, layerNames acts like LOSE.iterOutput and LOSE.iterItems, but every name in it has to match to names of the data passed(see examples below), if file temp.h5 exists it will be overwritten and then deleted.

example LOSE.generator() usage

for this example lets say that file has requested data in it and the model input/output layer names are present.

import numpy as np
from lose import LOSE

l = LOSE(fname='path/to/your/file/with/data.h5')

l.iterItems = [['x1', 'x2'], ['y']] # names of X and Y groups, all group names need to have batch dim the same and be present in the .h5 file
l.iterOutput = [['input_1', 'input_2'], ['dense_5']] # names of model's layers the data will be cast on, group.shape[1:] needs to match the layer's input shape
l.loopforever = True
l.batch_size = 20 # some batch size, can be bigger then the dataset, but won't output more data, it will just loop over or stop the iteration if LOSE.loopforever is False

l.limit = 10000 # lets say that the file has more data, but you only want to train on first 10000 samples

l.shuffle = True # enable data shuffling for the generator, costs memory and time

with l.generator() as gen:
	some_model.fit_generator(gen(), steps_per_epoch=50, epochs=1000, shuffle=False) # model.fit_generator() still can't shuffle the data, but LOSE.generator() can

example LOSE.make_generator(layerNames, limit=None, batch_size=1, shuffle=False, **data) usage

for this example lets say the model's input/output layer names are present and shapes match with the data.

import numpy as np
from lose import LOSE

l = LOSE()

num_samples = 1000

x1 = np.zeros((num_samples, 200)) # example data for the model, x1.shape[1:] == model.get_layer('input_1').output_shape[1:]
x2 = np.zeros((num_samples, 150)) # example data for the model, x2.shape[1:] == model.get_layer('input_2').output_shape[1:]
y = np.zeros((num_samples, 800)) # example data for the model, y.shape[1:] == model.get_layer('dense_5').output_shape[1:]

with l.make_generator([['input_1', 'input_2'], ['dense_5']], batch_size=10, shuffle=True, input_2=x2, input_1=x1, dense_5=y) as gen:
	del x1 #remove from memory
	del x2 #remove from memory
	del y #remove from memory

	some_model.fit_generator(gen(), steps_per_epoch=100, epochs=10000, shuffle=False) # again data can't be shuffled by model.fit_generator(), shuffling should be done by the generator

bugs/problems/issues

report them.

change log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lose-0.5.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

lose-0.5.1-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file lose-0.5.1.tar.gz.

File metadata

  • Download URL: lose-0.5.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for lose-0.5.1.tar.gz
Algorithm Hash digest
SHA256 e189b7f37b13198bff65d4e484123fc8090693b71932751e3247dad741c04ed0
MD5 8bb0c2eb3643842f9f5782f5de95c41d
BLAKE2b-256 d51d1e1ffc54592f404d301ea4c01b17e663839f687e238cc749b0400e49efa3

See more details on using hashes here.

File details

Details for the file lose-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: lose-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for lose-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ba000ee07a7f80fb0d6dca85491bbe261bf1c395c377917e1a07f45cc97d57f
MD5 ddcb16dd4bc52510c144c73bb4de3da6
BLAKE2b-256 4800e304aefc45730f6f08dc94c073f1914798faee1ff106d66a4c90f56e65d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page