Skip to main content

A local dataset loader based on tf.data input pipeline

Project description

tf-datachain

tf-datachain is a local dataset loader based on tf.data input pipeline. It handles the job of reading and encoding data direct in your disk and simplify the processing by providing several predefined methods.

Object Detection

from tf_datachain import ObjectDetection as od

Before using ObjectDetection functions, you have to define some basic information, like folder path and class name list.

od.imageFolder = "data/images"

# hard-code class names
od.classNames = ["class1", "class2", "class3"]
# or read them from csv file
import pandas as pd
od.classNames = pd.read_csv("class.csv", header=None).iloc[:,0].values.tolist()

Then, a ready-to-use tf.data input pipeline can be built within 3 steps:

  • Preparation: prepare the list to process without reading content.
  • Data Loading: load data from prepared list via tf.data.
  • Augmentation: shuffle, batch, and resize.

The best practices to load dataset with different format are shown below.

Pascal VOC XML Format

from tf_datachain.utils import split

BATCH_SIZE = 4
# read .xml file within data/annotaions folder
# then split them with the ratio of 6:2:2
trainDataset, validationDataset, testDataset = split(od.prepareAnnotation("data/annotations", ".xml"), 6, 2, 2)

trainDataset = tf.data.Dataset.from_tensor_slices(trainDataset)
trainDataset = trainDataset.map(lambda data: od.loadData(data, "Pascal VOC XML", "xyxy"), num_parallel_calls=tf.data.AUTOTUNE)
# shuffle, ragged batch, and jittered resize
trainDataset = od.datasetProcessing(trainDataset, BATCH_SIZE, "Jittered Resize", (960, 960), "xyxy")
trainDataset = trainDataset.prefetch(tf.data.AUTOTUNE)

Visualize Dataset

# visualize single data
for data in dataset.take(1):
  visualizeData(data, "xyxy")

# visualize dataset shown in 2x2 grid
visualizeDataset(dataset, "xyxy", rows=2, cols=2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf_datachain-0.1.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tf_datachain-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file tf_datachain-0.1.0.tar.gz.

File metadata

  • Download URL: tf_datachain-0.1.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for tf_datachain-0.1.0.tar.gz
Algorithm Hash digest
SHA256 18de6aaedfecc578677eb1ccdeb1984149382a3790841a2b7c7a31a5ff8f1ad6
MD5 5a704c886a26990d71512ee3a911a702
BLAKE2b-256 ba3bd774653b77d41b17101e463429ae40711b2acfc468db93eba32d16e9986c

See more details on using hashes here.

File details

Details for the file tf_datachain-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tf_datachain-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for tf_datachain-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 32f0accfd2328391056614359367192af876f37e00e90c4778d3f1283eed38bd
MD5 efcf09127d95c55916d4c7cb7419b4af
BLAKE2b-256 e124c884bd9fcef437d73eeb82d85f2136edc3b9cc3d5fe26f76f4ab3ba0ef3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page