Simplifies a process of encoding/decoding data using TFRecord framework.
Simplifies the process of encoding/decoding data using TFRecord framework.
I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.
Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.
For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate
TrikoFeature subclass. Each
TrikoFeature subclass must be initialized with a unique key ( see
init method ). Those keys are used to serialize data in
TrikoFeature utilizes generic. Each subclass must provide three types for itself.
An abstract example:
class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])
RAW_TYPE- an original type of your data that you want to encode
ENCODED_TYPE- a type your data will be in after encoding (TFRecord supports only a few types)
DECODED_TYPE- a type your data will be in after decoding
A specific example:
Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])
np.ndarray ( RAW_TYPE )- our image data is initially a
bytes ( ENCODED_TYPE )- we can't serialize raw
TFRecord( it won't be a good idea anyway ), so we will convert them to
np.ndarray ( DECODED_TYPE )- when reading
bytesare useless to us, so we will decoded it back to
Triko encode/decode data?
You must tell it how by overriding either
Continuing our example:
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]): def _encode_raw(self, raw_value: np.ndarray) -> bytes: # convert numpy array to bytes and return pass def _decode_value(self, encoded_value: bytes) -> np.ndarray: # read bytes and return numpy array pass
A simple built-in raw data validation
Before encoding raw data, you can validate its value by overriding
TrikoFeature in action
Consider a pseudocode:
with TFRecordWriter as writer: # you read an image and perform transformations img_array: np.ndarray = ... # label for the image label: str = ... # list of your TrikoFeature subclasses features: List[TrikoFeature] = ... def raw_value_getter(feature: TrikoFeature) -> Any: """ Maps a feature to a raw data """ # 'image' is a key you used for your TrikoFeature subclass # that represents an image if feature.key == 'image': return img_array return label serialized_features = TrikoFeature.encode_features_to_string( features=features, raw_value_getter=raw_value_getter, ) writer.write(serialized_features)
Consider a pseudocode:
# list of your TrikoFeature subclasses features: List[TrikoFeature] = ... dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))
The lib is cool, but pseudocode is not
See documented real-world example here
Only FixedLenFeature are now supported.
``` python 3.7 tensorflow numpy ```
``` pip install triko ```
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.