Skip to main content

Simplifies a process of encoding/decoding data using TFRecord framework.

Project description

Python 3.7

🤼 Triko

Simplifies the process of encoding/decoding data using TFRecord framework.

Getting Started

I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.

Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.

TrikoFeature

For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate TrikoFeature subclass. Each TrikoFeature subclass must be initialized with a unique key ( see init method ). Those keys are used to serialize data in TFRecord.

TrikoFeature utilizes generic. Each subclass must provide three types for itself.

An abstract example:

class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])

  • RAW_TYPE - an original type of your data that you want to encode
  • ENCODED_TYPE - a type your data will be in after encoding (TFRecord supports only a few types)
  • DECODED_TYPE - a type your data will be in after decoding

A specific example: Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a TFRecord dataset.

class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])

  • np.ndarray ( RAW_TYPE ) - our image data is initially a numpy matrix
  • bytes ( ENCODED_TYPE ) - we can't serialize raw numpy arrays using TFRecord ( it won't be a good idea anyway ), so we will convert them to bytes
  • np.ndarray ( DECODED_TYPE ) - when reading TFRecord dataset, bytes are useless to us, so we will decoded it back to np.ndarray

How does Triko encode/decode data?

You must tell it how by overriding either _encode_raw or _decode_value methods.

Continuing our example:

class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]):
	def _encode_raw(self, raw_value: np.ndarray) -> bytes:
		# convert numpy array to bytes and return
		pass

	def _decode_value(self, encoded_value: bytes) -> np.ndarray:
		# read bytes and return numpy array
		pass

A simple built-in raw data validation

Before encoding raw data, you can validate its value by overriding _validate_raw_value.

TrikoFeature in action

Encoding

Consider a pseudocode:

with TFRecordWriter as writer:
	# you read an image and perform transformations
	img_array: np.ndarray = ...
	# label for the image
	label: str = ...

	# list of your TrikoFeature subclasses
	features: List[TrikoFeature] = ...

	def raw_value_getter(feature: TrikoFeature) -> Any:
		"""
		Maps a feature to a raw data
		"""

		# 'image' is a key you used for your TrikoFeature subclass
		# that represents an image
		if feature.key == 'image':
			return img_array

		return label

	serialized_features = TrikoFeature.encode_features_to_string(
		features=features, raw_value_getter=raw_value_getter,
	)
	writer.write(serialized_features)

Decoding

Consider a pseudocode:

# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...

dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))

The lib is cool, but pseudocode is not

See documented real-world example here

Limitations

Only FixedLenFeature are now supported.

Prerequisites

``` python 3.7 tensorflow numpy ```

Installing

``` pip install triko ```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triko-0.0.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

triko-0.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file triko-0.0.1.tar.gz.

File metadata

  • Download URL: triko-0.0.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for triko-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f043c9cfb34c8f21a693452cc9650aabf96fc88f69b9b1052f24233d564a2788
MD5 3ac317060f6465d14ba4148349834e77
BLAKE2b-256 bb4ed82e4298856ab7be3e88900f19a1d38c41fea04df9e2f3efbf7050780071

See more details on using hashes here.

File details

Details for the file triko-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: triko-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3

File hashes

Hashes for triko-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b8cd335f39ae0f1ee038229e53a054c402e759034afdb8169b8c3a43638ecac
MD5 b7f4d5500c54c3bd6370dc9a131de88b
BLAKE2b-256 04b94ea85e07ceb002328f9b130409baae9551adb5345def1815cf45f5f44aaf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page