A local dataset loader based on tf.data input pipeline
Project description
tf-datachain
tf-datachain is a local dataset loader based on tf.data input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.
Object Detection
from tf_datachain import ObjectDetection as od
Before using ObjectDetection functions, you have to define some basic information, like folder path and class name list.
od.imageFolder = "data/images"
# hard-code class names
od.classNames = ["class1", "class2", "class3"]
# or read them from csv file
import pandas as pd
od.classNames = pd.read_csv("class.csv", header=None).iloc[:,0].values.tolist()
Then, a ready-to-use tf.data input pipeline can be built within 3 steps:
- Preparation: prepare the list to process without reading content.
- Data Loading: load data from prepared list via
tf.data. - Augmentation: shuffle, batch, and resize.
The best practices to load dataset with different format are shown below.
Pascal VOC XML Format
from tf_datachain.utils import split
BATCH_SIZE = 4
# read .xml file within data/annotaions folder
# then split them with the ratio of 6:2:2
trainDataset, validationDataset, testDataset = split(od.prepareAnnotation("data/annotations", ".xml"), 6, 2, 2)
trainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)
trainDataset = trainDataset.map(lambda data: od.loadData(data, "Pascal VOC XML", "xyxy"), num_parallel_calls=tf.data.AUTOTUNE)
# shuffle, ragged batch, and jittered resize
trainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, "Jittered Resize", (960, 960), "xyxy")
trainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)
Visualize Dataset
# visualize single data
for data in dataset.take(1):
visualizeData(data, "xyxy")
# visualize dataset shown in 2x2 grid
visualizeDataset(dataset, "xyxy", rows=2, cols=2)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tf_datachain-0.1.0.tar.gz.
File metadata
- Download URL: tf_datachain-0.1.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18de6aaedfecc578677eb1ccdeb1984149382a3790841a2b7c7a31a5ff8f1ad6
|
|
| MD5 |
5a704c886a26990d71512ee3a911a702
|
|
| BLAKE2b-256 |
ba3bd774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c
|
File details
Details for the file tf_datachain-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tf_datachain-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32f0accfd2328391056614359367192af876f37e00e90c4778d3f1283eed38bd
|
|
| MD5 |
efcf09127d95c55916d4c7cb7419b4af
|
|
| BLAKE2b-256 |
e124c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c
|