A local dataset loader based on tf.data input pipeline

Project description

tf-datachain

tf-datachain is a local dataset loader based on tf.data input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.

Object Detection

from tf_datachain import ObjectDetection as od

Before using ObjectDetection functions, you have to define some basic information, like folder path and class name list.

od.imageFolder = "data/images"

# hard-code class names
od.classNames = ["class1", "class2", "class3"]
# or read them from csv file
import pandas as pd
od.classNames = pd.read_csv("class.csv", header=None).iloc[:,0].values.tolist()

Then, a ready-to-use tf.data input pipeline can be built within 3 steps:

Preparation: prepare the list to process without reading content.
Data Loading: load data from prepared list via tf.data.
Augmentation: shuffle, batch, and resize.

The best practices to load dataset with different format are shown below.

Pascal VOC XML Format

from tf_datachain.utils import split

BATCH_SIZE = 4
# read .xml file within data/annotaions folder
# then split them with the ratio of 6:2:2
trainDataset, validationDataset, testDataset = split(od.prepareAnnotation("data/annotations", ".xml"), 6, 2, 2)

trainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)
trainDataset = trainDataset.map(lambda data: od.loadData(data, "Pascal VOC XML", "xyxy"), num_parallel_calls=tf.data.AUTOTUNE)
# shuffle, ragged batch, and jittered resize
trainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, "Jittered Resize", (960, 960), "xyxy")
trainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)

Visualize Dataset

# visualize single data
for data in dataset.take(1):
  visualizeData(data, "xyxy")

# visualize dataset shown in 2x2 grid
visualizeDataset(dataset, "xyxy", rows=2, cols=2)

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Aug 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf_datachain-0.1.0.tar.gz (4.5 kB view details)

Uploaded Aug 7, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tf_datachain-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Aug 7, 2023 Python 3

File details

Details for the file tf_datachain-0.1.0.tar.gz.

File metadata

Download URL: tf_datachain-0.1.0.tar.gz
Upload date: Aug 7, 2023
Size: 4.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for tf_datachain-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`18de6aaedfecc578677eb1ccdeb1984149382a3790841a2b7c7a31a5ff8f1ad6`
MD5	`5a704c886a26990d71512ee3a911a702`
BLAKE2b-256	`ba3bd774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c`

See more details on using hashes here.

File details

Details for the file tf_datachain-0.1.0-py3-none-any.whl.

File metadata

Download URL: tf_datachain-0.1.0-py3-none-any.whl
Upload date: Aug 7, 2023
Size: 5.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for tf_datachain-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`32f0accfd2328391056614359367192af876f37e00e90c4778d3f1283eed38bd`
MD5	`efcf09127d95c55916d4c7cb7419b4af`
BLAKE2b-256	`e124c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c`

See more details on using hashes here.

tf-datachain 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

tf-datachain

Object Detection

Pascal VOC XML Format

Visualize Dataset

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes