Simplifies a process of encoding/decoding data using TFRecord framework.
Project description
🤼 Triko
Simplifies the process of encoding/decoding data using TFRecord framework.
Getting Started
I was a bit overwhelmed after using TFRecord framework for the first time. I don't find its interface very appealing, so the idea was to encapsulate all the nitty-gritty in this library.
Note: I'm not an expert in TFRecord. I just found my approach very helpful in my workflow.
TrikoFeature
For each feature you want to serialize ( images, numbers, strings, labels ), you should use a separate TrikoFeature subclass. Each TrikoFeature subclass must be initialized with a unique key ( see init method ). Those keys are used to serialize data in TFRecord.
TrikoFeature utilizes generic. Each subclass must provide three types for itself.
An abstract example:
class DemoFeature(TrikoFeature[RAW_TYPE, ENCODED_TYPE, DECODED_TYPE])
RAW_TYPE- an original type of your data that you want to encodeENCODED_TYPE- a type your data will be in after encoding (TFRecord supports only a few types)DECODED_TYPE- a type your data will be in after decoding
A specific example:
Let's say we want to encode an image. We read it, transform it the way we like, and then it's time to serialize it to a TFRecord dataset.
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray])
np.ndarray ( RAW_TYPE )- our image data is initially anumpymatrixbytes ( ENCODED_TYPE )- we can't serialize rawnumpyarrays usingTFRecord( it won't be a good idea anyway ), so we will convert them tobytesnp.ndarray ( DECODED_TYPE )- when readingTFRecorddataset,bytesare useless to us, so we will decoded it back tonp.ndarray
How does Triko encode/decode data?
You must tell it how by overriding either _encode_raw or _decode_value methods.
Continuing our example:
class DemoImageFeature(TrikoFeature[np.ndarray, bytes, np.ndarray]):
def _encode_raw(self, raw_value: np.ndarray) -> bytes:
# convert numpy array to bytes and return
pass
def _decode_value(self, encoded_value: bytes) -> np.ndarray:
# read bytes and return numpy array
pass
A simple built-in raw data validation
Before encoding raw data, you can validate its value by overriding _validate_raw_value.
TrikoFeature in action
Encoding
Consider a pseudocode:
with TFRecordWriter as writer:
# you read an image and perform transformations
img_array: np.ndarray = ...
# label for the image
label: str = ...
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
def raw_value_getter(feature: TrikoFeature) -> Any:
"""
Maps a feature to a raw data
"""
# 'image' is a key you used for your TrikoFeature subclass
# that represents an image
if feature.key == 'image':
return img_array
return label
serialized_features = TrikoFeature.encode_features_to_string(
features=features, raw_value_getter=raw_value_getter,
)
writer.write(serialized_features)
Decoding
Consider a pseudocode:
# list of your TrikoFeature subclasses
features: List[TrikoFeature] = ...
dataset = tf.data.TFRecordDataset().map(TrikoFeature.decoder(features=features))
The lib is cool, but pseudocode is not
See documented real-world example here
Limitations
Only FixedLenFeature are now supported.
Prerequisites
``` python 3.7 tensorflow numpy ```
Installing
``` pip install triko ```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file triko-0.0.1.tar.gz.
File metadata
- Download URL: triko-0.0.1.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f043c9cfb34c8f21a693452cc9650aabf96fc88f69b9b1052f24233d564a2788
|
|
| MD5 |
3ac317060f6465d14ba4148349834e77
|
|
| BLAKE2b-256 |
bb4ed82e4298856ab7be3e88900f19a1d38c41fea04df9e2f3efbf7050780071
|
File details
Details for the file triko-0.0.1-py3-none-any.whl.
File metadata
- Download URL: triko-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.4.0 requests-toolbelt/0.8.0 tqdm/4.46.1 CPython/3.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b8cd335f39ae0f1ee038229e53a054c402e759034afdb8169b8c3a43638ecac
|
|
| MD5 |
b7f4d5500c54c3bd6370dc9a131de88b
|
|
| BLAKE2b-256 |
04b94ea85e07ceb002328f9b130409baae9551adb5345def1815cf45f5f44aaf
|