Skip to main content

A Tensorflow TFRecord Utility Package

Project description

tfrmaker

GitHub GitHub last commit GitHub issues

Utility package which helps to ease the manipulation of tfrecords.

Contents

Description

tfrmaker helps to ease the manipulation of tfrecords for your next machine learning project with tensorflow. You can now easily create, extract and load image datasets in the form of tfrecords with help of tfrmaker. With the help of the package, large image datasets can be converted into tfrecords and fed directly into tensorflow models for training and testing purposes. Some key feature of the package includes:

  • dynamic resizing
  • splitting tfrecords into optimal shards
  • spliting training, validation, testing of tfrecords
  • count no of images in tfrecords
  • asynchronous tfrecord creation

Why TFRecords?

TFRecords stores data as a sequence of binary records with help of protocol buffers, a cross-platform, cross-language library. It has many advantages like:

  • Efficient storage: TFRecord data can take up less space than the original data; it can also be partitioned into multiple files.
  • Fast I/O: TFRecord format can be read with parallel I/O operations, which is useful for TPUs or multiple hosts.

Installation

Use the package manager pip to install tfrmaker.

pip install tfrmaker

Usage

A minimal usage of tfrmaker with image data, organized as directores with name as class labels:

from tfrmaker import images, display

# mapping label names with integer encoding.
LABELS = {"bishop": 0, "knight": 1, "pawn": 2, "queen": 3, "rook": 4}

# specifiying data and output directories.
DATA_DIR = "datasets/chess/"
OUTPUT_DIR = "tfrecords/chess/"

# create tfrecords from the images present in the given data directory.
images.create(DATA_DIR, LABELS, OUTPUT_DIR)

# load one or more tfrecords as an iterator object.
dataset = images.load(["tfrecords/chess/queen.tfrecord","tfrecords/chess/bishop.tfrecords"], batch_size=32, repeat=True)

# iterate one batch and visualize it along with labels.
databatch = next(iter(dataset))
display.batch(databatch, LABELS)

Refer examples folder for more advanced usage.

Support

"Your mental support by staring the repo is much appreciated."

Contribute

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tfrmaker-0.0.2.tar.gz (11.3 kB view hashes)

Uploaded Source

Built Distribution

tfrmaker-0.0.2-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page