Simplifies a process of encoding/decoding data using TFRecord framework.
Project description
🤼 Triko
Simplifies the process of encoding/decoding data using TFRecord framework.
Getting Started
I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.
Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.
TrikoFeature
For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate TrikoFeature
subclass. Each TrikoFeature
subclass must be initialized with a unique key ( see init
method ). Those keys are used to serialize data in TFRecord
.
TrikoFeature
utilizes generic. Each subclass must provide three types for itself.
An abstract example:
class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])
RAW_TYPE
- an original type of your data that you want to encodeENCODED_TYPE
- a type your data will be in after encoding (TFRecord supports only a few types)DECODED_TYPE
- a type your data will be in after decoding
A specific example:
Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a TFRecord
dataset.
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])
np.ndarray ( RAW_TYPE )
- our image data is initially anumpy
matrixbytes ( ENCODED_TYPE )
- we can't serialize rawnumpy
arrays usingTFRecord
( it won't be a good idea anyway ), so we will convert them tobytes
np.ndarray ( DECODED_TYPE )
- when readingTFRecord
dataset,bytes
are useless to us, so we will decoded it back tonp.ndarray
How does Triko
encode/decode data?
You must tell it how by overriding either _encode_raw
or _decode_value
methods.
Continuing our example:
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]):
def _encode_raw(self, raw_value: np.ndarray) -> bytes:
# convert numpy array to bytes and return
pass
def _decode_value(self, encoded_value: bytes) -> np.ndarray:
# read bytes and return numpy array
pass
A simple built-in raw data validation
Before encoding raw data, you can validate its value by overriding _validate_raw_value
.
TrikoFeature in action
Encoding
Consider a pseudocode:
with TFRecordWriter as writer:
# you read an image and perform transformations
img_array: np.ndarray = ...
# label for the image
label: str = ...
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
def raw_value_getter(feature: TrikoFeature) -> Any:
"""
Maps a feature to a raw data
"""
# 'image' is a key you used for your TrikoFeature subclass
# that represents an image
if feature.key == 'image':
return img_array
return label
serialized_features = TrikoFeature.encode_features_to_string(
features=features, raw_value_getter=raw_value_getter,
)
writer.write(serialized_features)
Decoding
Consider a pseudocode:
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))
The lib is cool, but pseudocode is not
See documented real-world example here
Limitations
Only FixedLenFeature are now supported.
Prerequisites
``` python 3.7 tensorflow numpy ```
Installing
``` pip install triko ```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file triko-0.0.1.tar.gz
.
File metadata
- Download URL: triko-0.0.1.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f043c9cfb34c8f21a693452cc9650aabf96fc88f69b9b1052f24233d564a2788 |
|
MD5 | 3ac317060f6465d14ba4148349834e77 |
|
BLAKE2b-256 | bb4ed82e4298856ab7be3e88900f19a1d38c41fea04df9e2f3efbf7050780071 |
File details
Details for the file triko-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: triko-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b8cd335f39ae0f1ee038229e53a054c402e759034afdb8169b8c3a43638ecac |
|
MD5 | b7f4d5500c54c3bd6370dc9a131de88b |
|
BLAKE2b-256 | 04b94ea85e07ceb002328f9b130409baae9551adb5345def1815cf45f5f44aaf |