Skip to main content

Package to streamline reading and writing data to tfrecord files

Project description

easy_tfrecords

this package is designed to assist reading and writing to tfrecord files in an intuitive way that preserves dtype and data structure

Purpose:

The tfrecord format is a fast and powerful way of feeding data to a tensorflow model; it can automatically batch, randomize and iterate your data across epochs without special instructions. The problem with using tfrecord files comes from orchestrating the madness of matching feature structures across the reader, writer and fetcher.

The easy_tfrecords module contains methods and classes that allow you to write to and read from tfrecord files in a straightforward, extensible manner.

Features:

  • create tfrecord files
  • read from single or multiple tfrecord files
  • selectively read data from tfrecord files
  • examine the data structure of tfrecord files

Usage:

Writing

  • Import data into python however you normally would (excel, pandas, csv, matlab, etc.)
  • Reshape each of your arrays of features to shape=[N, x[, y[, z[, etc.]]]] where N is the number of features.
    • Add multiple lists of features to the file as key-value pairs

Reading

  • Create a reader class object, specifying your file list (can be length 1), optionally specifying batch size and shuffe spec.
  • pass a list of which inputs to read from the file

Example Code:

import numpy as np
import tensorflow as tf

from easy_tfrecords import create_tfrecords, easy_tfrecords as records


# CREATE SOME TEST DATA
x      = np.array([[0, 0, 0, 0], [0, 0, 0, 0]], np.int32)
trainX = np.asarray( [x, x+1, x+2] )

y      = np.array([0.25], np.float32)
trainY = np.asarray( [y, y+1, y+2] )


# CREATE AND SAVE TO A FEW TFRECORDS FILES
create_tfrecords('tfr_1.tf', x=trainX, y=trainY)
create_tfrecords('tfr_2.tf', x=trainX+10, y=trainY+10)
create_tfrecords('tfr_3.tf', x=trainX+100, y=trainY+100, z=trainY+100)

# INSTANTIATE THE RECORDS OBJECT
rec = records(files=['data_1.tf', 'data_2.tf'],
  shuffle=False,
  batch_size=1, 
  keys=['x', 'y'])

next_factory = rec.get_next_factory()

batch_x = next_factory['x']
batch_y = next_factory['y']

with tf.Session() as sess:

  sess.run(rec.get_initializer())

  for n in range(10):
    print('------------')
    print('n => {}\n'.format(n))

    x_eval, y_eval = sess.run( [batch_x, batch_y] )
    print('x_eval=\n{}\n'.format(x_eval))
    print('y_eval=\n{}'.format(y_eval))

sess.close()

Output :

------------
n => 0

x_eval=
[[ 0.25]]

y_eval=
[[[0 0 0 0]
  [0 0 0 0]]]
------------
n => 1

x_eval=
[[ 1.25]]

y_eval=
[[[1 1 1 1]
  [1 1 1 1]]]
------------
n => 2

x_eval=
[[ 2.25]]

y_eval=
[[[2 2 2 2]
  [2 2 2 2]]]
------------
n => 3

x_eval=
[[ 100.25]]

y_eval=
[[[100 100 100 100]
  [100 100 100 100]]]
------------
n => 4

x_eval=
[[ 101.25]]

y_eval=
[[[101 101 101 101]
  [101 101 101 101]]]
------------
n => 5

x_eval=
[[ 102.25]]

y_eval=
[[[102 102 102 102]
  [102 102 102 102]]]
------------
n => 6

x_eval=
[[ 10.25]]

y_eval=
[[[10 10 10 10]
  [10 10 10 10]]]
------------
n => 7

x_eval=
[[ 11.25]]

y_eval=
[[[11 11 11 11]
  [11 11 11 11]]]
------------
n => 8

x_eval=
[[ 12.25]]

y_eval=
[[[12 12 12 12]
  [12 12 12 12]]]
------------
n => 9

x_eval=
[[ 0.25]]

y_eval=
[[[0 0 0 0]
  [0 0 0 0]]]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_tfrecords-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easy_tfrecords-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file easy_tfrecords-0.1.0.tar.gz.

File metadata

  • Download URL: easy_tfrecords-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.4

File hashes

Hashes for easy_tfrecords-0.1.0.tar.gz
Algorithm Hash digest
SHA256 86098faeee9a9d7214de25386b9fdfb1e589da7e9060fb2fffc21a9aa555f2d0
MD5 8c51ed8e7ee3c5cfff33100190466593
BLAKE2b-256 4dc3103ef91a3ca3c103ae9e0047724610349e385c814ba7ce7de70d5fd3a819

See more details on using hashes here.

File details

Details for the file easy_tfrecords-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: easy_tfrecords-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.4

File hashes

Hashes for easy_tfrecords-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e38dc019e8e9c039c1b4b7e78a4b207d48591cb05919148c1a694559a7b9b7c7
MD5 3a3a78dee57b5e4cd5f1c4dfded0ed84
BLAKE2b-256 6d85fd2285fc471153b46d4bd655f2954432f90bb0f017f2207962f4520fb473

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page