!Alpha Version! - This repository contains code to make datasets stored on the corpora network drive of the chair compatible with the [tensorflow dataset api](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)
Project description
Description
This repository contains code to make datasets stored on th corpora network drive of the chair compatible with the tensorflow dataset api .
Currently available Datasets
Dataset | Status | Url |
---|---|---|
ckplus | ✅ | http://www.iainm.com/publications/Lucey2010-The-Extended/paper.pdf |
affectnet | ✅ | http://mohammadmahoor.com/affectnet/ |
faces | ✅ | https://faces.mpdl.mpg.de/imeji/ |
nova_dynamic | ✅ | https://github.com/hcmlab/nova |
audioset | ❌ | https://research.google.com/audioset/ |
is2021_ess | ❌ | - |
librispeech | ❌ | https://www.openslr.org/12 |
Example Usage
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import hcai_datasets
from matplotlib import pyplot as plt
# Preprocessing function
def preprocess(x, y):
img = x.numpy()
return img, y
# Creating a dataset
ds, ds_info = tfds.load(
'hcai_example_dataset',
split='train',
with_info=True,
as_supervised=True,
builder_kwargs={'dataset_dir': os.path.join('path', 'to', 'directory')}
)
# Input output mapping
ds = ds.map(lambda x, y: (tf.py_function(func=preprocess, inp=[x, y], Tout=[tf.float32, tf.int64])))
# Manually iterate over dataset
img, label = next(ds.as_numpy_iterator())
# Visualize
plt.imshow(img / 255.)
plt.show()
Example Usage Nova Dynamic Data
import os
import hcai_datasets
import tensorflow_datasets as tfds
from sklearn.svm import LinearSVC
import numpy as np
from sklearn.calibration import CalibratedClassifierCV
import warnings
warnings.simplefilter("ignore")
## Load Data
ds, ds_info = tfds.load(
'hcai_nova_dynamic',
split='dynamic_split',
with_info=True,
as_supervised=True,
data_dir='.',
read_config=tfds.ReadConfig(
shuffle_seed=1337
),
builder_kwargs={
# Database Config
'db_config_path': 'nova_db.cfg',
'db_config_dict': None,
# Dataset Config
'dataset': '<dataset_name>',
'nova_data_dir': os.path.join('C:', 'Nova', 'Data'),
'sessions': ['<session_name>'],
'roles': ['<role_one>', '<role_two>'],
'schemes': ['<label_scheme_one'],
'annotator': '<annotator_id>',
'data_streams': ['<stream_name>'],
# Sample Config
'frame_step': 1,
'left_context': 0,
'right_context': 0,
'start': None,
'end': None,
'flatten_samples': False,
'supervised_keys': ['<role_one>.<stream_name>', '<scheme_two>'],
# Additional Config
'clear_cache' : True
}
)
data_it = ds.as_numpy_iterator()
data_list = list(data_it)
data_list.sort(key=lambda x: int(x['frame'].decode('utf-8').split('_')[0]))
x = [v['<stream_name>'] for v in data_list]
y = [v['<scheme_two'] for v in data_list]
x_np = np.ma.concatenate( x, axis=0 )
y_np = np.array( y )
linear_svc = LinearSVC()
model = CalibratedClassifierCV(linear_svc,
method='sigmoid',
cv=3)
print('train_x shape: {} | train_x[0] shape: {}'.format(x_np.shape, x_np[0].shape))
model.fit(x_np, y_np)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file hcai_datasets_nightly-0.0.16.dev202109141636-py3-none-any.whl
.
File metadata
- Download URL: hcai_datasets_nightly-0.0.16.dev202109141636-py3-none-any.whl
- Upload date:
- Size: 39.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.0 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e51b64b4f5235d32ff12491fd7ec798a0bafe6b866aacb57bfcd7f10e854b79a |
|
MD5 | 7ec07baa08a662f1a9c68bc3008399a3 |
|
BLAKE2b-256 | 40c813d048759ca425b4c157289e35c1f07d19c776f8ad7100e992bb2e8b66d3 |