Simplifies a process of encoding/decoding data using TFRecord framework.
Project description
🤼 Triko
Simplifies the process of encoding/decoding data using TFRecord framework.
Getting Started
I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.
Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.
TrikoFeature
For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate TrikoFeature
subclass. Each TrikoFeature
subclass must be initialized with a unique key ( see init
method ). Those keys are used to serialize data in TFRecord
.
TrikoFeature
utilizes generic. Each subclass must provide three types for itself.
An abstract example:
class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])
RAW_TYPE
- an original type of your data that you want to encodeENCODED_TYPE
- a type your data will be in after encoding (TFRecord supports only a few types)DECODED_TYPE
- a type your data will be in after decoding
A specific example:
Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a TFRecord
dataset.
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])
np.ndarray ( RAW_TYPE )
- our image data is initially anumpy
matrixbytes ( ENCODED_TYPE )
- we can't serialize rawnumpy
arrays usingTFRecord
( it won't be a good idea anyway ), so we will convert them tobytes
np.ndarray ( DECODED_TYPE )
- when readingTFRecord
dataset,bytes
are useless to us, so we will decoded it back tonp.ndarray
How does Triko
encode/decode data?
You must tell it how by overriding either _encode_raw
or _decode_value
methods.
Continuing our example:
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]):
def _encode_raw(self, raw_value: np.ndarray) -> bytes:
# convert numpy array to bytes and return
pass
def _decode_value(self, encoded_value: bytes) -> np.ndarray:
# read bytes and return numpy array
pass
A simple built-in raw data validation
Before encoding raw data, you can validate its value by overriding _validate_raw_value
.
TrikoFeature in action
Encoding
Consider a pseudocode:
with TFRecordWriter as writer:
# you read an image and perform transformations
img_array: np.ndarray = ...
# label for the image
label: str = ...
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
def raw_value_getter(feature: TrikoFeature) -> Any:
"""
Maps a feature to a raw data
"""
# 'image' is a key you used for your TrikoFeature subclass
# that represents an image
if feature.key == 'image':
return img_array
return label
serialized_features = TrikoFeature.encode_features_to_string(
features=features, raw_value_getter=raw_value_getter,
)
writer.write(serialized_features)
Decoding
Consider a pseudocode:
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))
The lib is cool, but pseudocode is not
See documented real-world example here
Limitations
Only FixedLenFeature are now supported.
Prerequisites
``` python 3.7 tensorflow numpy ```
Installing
``` pip install triko ```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.