!Alpha Version! - This repository contains code to make datasets stored on the corpora network drive of the chair compatible with the [tensorflow dataset api](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)
Project description
Description
This repository contains code to make datasets stored on th corpora network drive of the chair compatible with the tensorflow dataset api .
Currently available Datasets
Dataset | Status | Url |
---|---|---|
audioset | ❌ | https://research.google.com/audioset/ |
ckplus | ✅ | http://www.iainm.com/publications/Lucey2010-The-Extended/paper.pdf |
faces | ✅ | https://faces.mpdl.mpg.de/imeji/ |
is2021_ess | ❌ | - |
librispeech | ❌ | https://www.openslr.org/12 |
nova_dynamic | ✅ | https://github.com/hcmlab/nova |
Example Usage
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import hcai_datasets
from matplotlib import pyplot as plt
# Preprocessing function
def preprocess(x, y):
img = x.numpy()
return img, y
# Creating a dataset
ds, ds_info = tfds.load(
'hcai_example_dataset',
split='train',
with_info=True,
as_supervised=True,
builder_kwargs={'dataset_dir': os.path.join('path', 'to', 'directory')}
)
# Input output mapping
ds = ds.map(lambda x, y: (tf.py_function(func=preprocess, inp=[x, y], Tout=[tf.float32, tf.int64])))
# Manually iterate over dataset
img, label = next(ds.as_numpy_iterator())
# Visualize
plt.imshow(img / 255.)
plt.show()
Example Usage Nova Dynamic Data
import os
import hcai_datasets
import tensorflow_datasets as tfds
from sklearn.svm import LinearSVC
import numpy as np
from sklearn.calibration import CalibratedClassifierCV
import warnings
warnings.simplefilter("ignore")
## Load Data
ds, ds_info = tfds.load(
'hcai_nova_dynamic',
split='dynamic_split',
with_info=True,
as_supervised=True,
data_dir='.',
read_config=tfds.ReadConfig(
shuffle_seed=1337
),
builder_kwargs={
# Database Config
'db_config_path': 'nova_db.cfg',
'db_config_dict': None,
# Dataset Config
'dataset': '<dataset_name>',
'nova_data_dir': os.path.join('C:', 'Nova', 'Data'),
'sessions': ['<session_name>'],
'roles': ['<role_one>', '<role_two>'],
'schemes': ['<label_scheme_one'],
'annotator': '<annotator_id>',
'data_streams': ['<stream_name>'],
# Sample Config
'frame_step': 1,
'left_context': 0,
'right_context': 0,
'start': None,
'end': None,
'flatten_samples': False,
'supervised_keys': ['<role_one>.<stream_name>', '<scheme_two>'],
# Additional Config
'clear_cache' : True
}
)
data_it = ds.as_numpy_iterator()
data_list = list(data_it)
data_list.sort(key=lambda x: int(x['frame'].decode('utf-8').split('_')[0]))
x = [v['<stream_name>'] for v in data_list]
y = [v['<scheme_two'] for v in data_list]
x_np = np.ma.concatenate( x, axis=0 )
y_np = np.array( y )
linear_svc = LinearSVC()
model = CalibratedClassifierCV(linear_svc,
method='sigmoid',
cv=3)
print('train_x shape: {} | train_x[0] shape: {}'.format(x_np.shape, x_np[0].shape))
model.fit(x_np, y_np)
import os
import hcai_datasets
import tensorflow_datasets as tfds
## Load Data
ds, ds_info = tfds.load(
'hcai_nova_dynamic',
split='dynamic_split',
with_info=True,
as_supervised=True,
builder_kwargs={
# Database Config
'db_config_path': 'db.cfg',
'db_config_dict': None,
# Dataset Config
'dataset': '<dataset_name>',
'nova_data_dir': os.path.join('C:', 'Nova', 'Data'),
'sessions': ['<session_name>'],
'roles': ['<role_one>', '<role_two>'],
'schemes': ['<label_scheme_one'],
'annotator': '<annotator_id>',
'data_streams': ['<stream_name>'],
# Sample Config
'frame_step': 1,
'left_context': 0,
'right_context': 0,
'start': None,
'end': None,
#'flatten_samples': False,
'supervised_keys': ['<role_one>.<stream_name>', '<scheme_two>'],
# Additional Config
'clear_cache' : True
}
)
data_it = ds.as_numpy_iterator()
ex_data = next(data_it)
print(ex_data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hcai-datasets-0.0.7.tar.gz
(20.0 kB
view details)
Built Distribution
File details
Details for the file hcai-datasets-0.0.7.tar.gz
.
File metadata
- Download URL: hcai-datasets-0.0.7.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f864292da6c2bcdf79f14a8a44167e2fad600d19fd51c589fe19333de615fdc |
|
MD5 | 5c178bf00b50120b30bf3674d2b5a60d |
|
BLAKE2b-256 | bb315ad24dded562aea6d607d466ee1a946a06b3a77e8510cd66d9e196df440d |
File details
Details for the file hcai_datasets-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: hcai_datasets-0.0.7-py3-none-any.whl
- Upload date:
- Size: 31.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33f9888bc4b6ee1d327b5f4ed0d5a5c3b1a84ab15833d565f4b7818a0432904b |
|
MD5 | 74bca513742a200a052c17650f12e858 |
|
BLAKE2b-256 | 511cd5bbafb0a9a5b8a23530ce2f4d5e8c86e8f0b2bbf159d93a978fd756ec51 |