Skip to main content

!Alpha Version! - This repository contains code to make datasets stored on the corpora network drive of the chair compatible with the [tensorflow dataset api](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)

Project description

Description

This repository contains code to make datasets stored on th corpora network drive of the chair compatible with the tensorflow dataset api .

Currently available Datasets

Dataset Status Url
audioset https://research.google.com/audioset/
ckplus http://www.iainm.com/publications/Lucey2010-The-Extended/paper.pdf
faces https://faces.mpdl.mpg.de/imeji/
is2021_ess -
librispeech https://www.openslr.org/12
nova_dynamic https://github.com/hcmlab/nova

Example Usage

import os
import tensorflow as tf
import tensorflow_datasets as tfds
import hcai_datasets
from matplotlib import pyplot as plt

# Preprocessing function
def preprocess(x, y):
  img = x.numpy()
  return img, y

# Creating a dataset
ds, ds_info = tfds.load(
  'hcai_example_dataset',
  split='train',
  with_info=True,
  as_supervised=True,
  builder_kwargs={'dataset_dir': os.path.join('path', 'to', 'directory')}
)

# Input output mapping
ds = ds.map(lambda x, y: (tf.py_function(func=preprocess, inp=[x, y], Tout=[tf.float32, tf.int64])))

# Manually iterate over dataset
img, label = next(ds.as_numpy_iterator())

# Visualize
plt.imshow(img / 255.)
plt.show()

Example Usage Nova Dynamic Data

import os
import hcai_datasets
import tensorflow_datasets as tfds
from sklearn.svm import LinearSVC
import numpy as np
from sklearn.calibration import CalibratedClassifierCV
import warnings
warnings.simplefilter("ignore")

## Load Data
ds, ds_info = tfds.load(
  'hcai_nova_dynamic',
  split='dynamic_split',
  with_info=True,
  as_supervised=True,
  data_dir='.',
  read_config=tfds.ReadConfig(
    shuffle_seed=1337
  ),
  builder_kwargs={
    # Database Config
    'db_config_path': 'nova_db.cfg',
    'db_config_dict': None,

    # Dataset Config
    'dataset': '<dataset_name>',
    'nova_data_dir': os.path.join('C:', 'Nova', 'Data'),
    'sessions': ['<session_name>'],
    'roles': ['<role_one>', '<role_two>'],
    'schemes': ['<label_scheme_one'],
    'annotator': '<annotator_id>',
    'data_streams': ['<stream_name>'],

    # Sample Config
    'frame_step': 1,
    'left_context': 0,
    'right_context': 0,
    'start': None,
    'end': None,
    'flatten_samples': False, 
    'supervised_keys': ['<role_one>.<stream_name>', '<scheme_two>'],

    # Additional Config
    'clear_cache' : True
  }
)

data_it = ds.as_numpy_iterator()
data_list = list(data_it)
data_list.sort(key=lambda x: int(x['frame'].decode('utf-8').split('_')[0]))
x = [v['<stream_name>'] for v in data_list]
y = [v['<scheme_two'] for v in data_list]

x_np = np.ma.concatenate( x, axis=0 )
y_np = np.array( y )

linear_svc = LinearSVC()
model = CalibratedClassifierCV(linear_svc,
                               method='sigmoid',
                               cv=3)
print('train_x shape: {} | train_x[0] shape: {}'.format(x_np.shape, x_np[0].shape))
model.fit(x_np, y_np)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcai-datasets-0.0.9.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

hcai_datasets-0.0.9-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file hcai-datasets-0.0.9.tar.gz.

File metadata

  • Download URL: hcai-datasets-0.0.9.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.7

File hashes

Hashes for hcai-datasets-0.0.9.tar.gz
Algorithm Hash digest
SHA256 700b09601a0181d78ba78fdd1146516c0111b10c711cd76bef50baf2610d6076
MD5 552beddea1beca7f612ef535074b82bd
BLAKE2b-256 157aeb683d62410f43ec1ead371089d1d19a19e731022cfec5f45e80e18c8b4a

See more details on using hashes here.

File details

Details for the file hcai_datasets-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: hcai_datasets-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.7

File hashes

Hashes for hcai_datasets-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d4a9d1e745b86c971d79abda700d4b54a7870c59429fdb0678d3a3f06c031ec7
MD5 ee115d9603a566608eaa7cc8e1af3614
BLAKE2b-256 1bb58f8c2c9cecc93187a47352834dd44b8da6b342671e933e16c1fc63a96c19

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page