Interface for reading HDF5 files into Tensorflow.
tftables allows convenient access to HDF5 files with Tensorflow. A class for reading batches of data out of arrays or tables is provided. A secondary class wraps both the primary reader and a Tensorflow FIFOQueue for straight-forward streaming of data from HDF5 files into Tensorflow operations.
The library is backed by multitables for high-speed reading of HDF5 datasets. multitables is based on PyTables (tables), so this library can make use of any compression algorithms that PyTables supports.
This software is distributed under the MIT licence. See the LICENSE.txt file for details.
pip install tftables
Alternatively, to install from HEAD, run
pip install git+https://github.com/ghcollin/tftables.git
python setup.py install
tftables depends on multitables, numpy and tensorflow. The package is compatible with the latest versions of python 2 and 3.
An example of accessing a table in a HDF5 file.
import tftables import tensorflow as tf with tf.device('/cpu:0'): # This function preprocesses the batches before they # are loaded into the internal queue. # You can cast data, or do one-hot transforms. # If the dataset is a table, this function is required. def input_transform(tbl_batch): labels = tbl_batch['label'] data = tbl_batch['data'] truth = tf.to_float(tf.one_hot(labels, num_labels, 1, 0)) data_float = tf.to_float(data) return truth, data_float # Open the HDF5 file and create a loader for a dataset. # The batch_size defines the length (in the outer dimension) # of the elements (batches) returned by the reader. # Takes a function as input that pre-processes the data. loader = tftables.load_dataset(filename='path/to/h5_file.h5', dataset_path='/internal/h5/path', input_transform=input_transform, batch_size=20) # To get the data, we dequeue it from the loader. # Tensorflow tensors are returned in the same order as input_transformation truth_batch, data_batch = loader.dequeue() # The placeholder can then be used in your network result = my_network(truth_batch, data_batch) with tf.Session() as sess: # This context manager starts and stops the internal threads and # processes used to read the data from disk and store it in the queue. with loader.begin(sess): for _ in range(num_iterations): sess.run(result)
If the dataset is an array instead of a table. Then input_transform can be omitted if no pre-processing is required. If only a single pass through the dataset is desired, then you should pass cyclic=False to load_dataset.
See the unit tests for complete examples.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size tftables-1.1.2-py2.py3-none-any.whl (12.4 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View hashes|
Hashes for tftables-1.1.2-py2.py3-none-any.whl