Skip to main content

Simplifies a process of encoding/decoding data using TFRecord framework.

Project description

Python 3.7

🤼 Triko

Simplifies the process of encoding/decoding data using TFRecord framework.

Getting Started

I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.

Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.

TrikoFeature

For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate TrikoFeature subclass. Each TrikoFeature subclass must be initialized with a unique key ( see init method ). Those keys are used to serialize data in TFRecord.

TrikoFeature utilizes generic. Each subclass must provide three types for itself.

An abstract example:

class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])

  • RAW_TYPE - an original type of your data that you want to encode
  • ENCODED_TYPE - a type your data will be in after encoding (TFRecord supports only a few types)
  • DECODED_TYPE - a type your data will be in after decoding

A specific example: Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a TFRecord dataset.

class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])

  • np.ndarray ( RAW_TYPE ) - our image data is initially a numpy matrix
  • bytes ( ENCODED_TYPE ) - we can't serialize raw numpy arrays using TFRecord ( it won't be a good idea anyway ), so we will convert them to bytes
  • np.ndarray ( DECODED_TYPE ) - when reading TFRecord dataset, bytes are useless to us, so we will decoded it back to np.ndarray

How does Triko encode/decode data?

You must tell it how by overriding either _encode_raw or _decode_value methods.

Continuing our example:

class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]):
	def _encode_raw(self, raw_value: np.ndarray) -> bytes:
		# convert numpy array to bytes and return
		pass

	def _decode_value(self, encoded_value: bytes) -> np.ndarray:
		# read bytes and return numpy array
		pass

A simple built-in raw data validation

Before encoding raw data, you can validate its value by overriding _validate_raw_value.

TrikoFeature in action

Encoding

Consider a pseudocode:

with TFRecordWriter as writer:
	# you read an image and perform transformations
	img_array: np.ndarray = ...
	# label for the image
	label: str = ...

	# list of your TrikoFeature subclasses
	features: List[TrikoFeature] = ...

	def raw_value_getter(feature: TrikoFeature) -> Any:
		"""
		Maps a feature to a raw data
		"""

		# 'image' is a key you used for your TrikoFeature subclass
		# that represents an image
		if feature.key == 'image':
			return img_array

		return label

	serialized_features = TrikoFeature.encode_features_to_string(
		features=features, raw_value_getter=raw_value_getter,
	)
	writer.write(serialized_features)

Decoding

Consider a pseudocode:

# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...

dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))

The lib is cool, but pseudocode is not

See documented real-world example here

Limitations

Only FixedLenFeature are now supported.

Prerequisites

``` python 3.7 tensorflow numpy ```

Installing

``` pip install triko ```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triko-0.0.1.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distribution

triko-0.0.1-py3-none-any.whl (6.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page