hcai-datasets-nightly

!Alpha Version! - This repository contains the backend server for the nova annotation ui (https://github.com/hcmlab/nova)

These details have not been verified by PyPI

Project links

Homepage

Project description

Description

This repository contains code to make datasets stored on the corpora network drive of the chair. You can use this project to easily create or reuse a data loader that is universally compatible with either plain python code or tensorflow / pytorch. Also you this code can be used to dynamically create a dataloader for a Nova database to directly work with Nova Datasets in Python.

Compatible with the tensorflow dataset api. Pytorch Dataset is also supported.

Installation Information

For efficient data loading we rely on the decord library. Decord ist not available as prebuild binary for non x86 architectures. If you want to install the project on other architecture you will need to compile it yourself.

Currently available Datasets

Dataset	Status	Url
ckplus	✅	http://www.iainm.com/publications/Lucey2010-The-Extended/paper.pdf
affectnet	✅	http://mohammadmahoor.com/affectnet/
faces	✅	https://faces.mpdl.mpg.de/imeji/
nova_dynamic	✅	https://github.com/hcmlab/nova
audioset	❌	https://research.google.com/audioset/
is2021_ess	❌	-
librispeech	❌	https://www.openslr.org/12

Architecture

uml diagram

Dataset implementations are split into two parts.\

Data access is handled by a generic python iterable, implemented by the DatasetIterable interface.
The access class is then extended by an API class, which implements tfds.core.GeneratorBasedBuilder. This results in the dataset being available by the Tensorflow Datasets API, and enables features such as local caching.

The iterables themselves can also be used as-is, either in PyTorch native DataGenerators by wrapping them in the utility class BridgePyTorch, or as tensorflow-native Datasets by passing them to BridgeTensorflow.

The benefits of this setup are that a pytorch application can be served without installing or loading tensorflow, and vice versa, since the stack up to the adapters does not involve tf or pytorch. Also, when using tf, caching can be used or discarded by using tfds or the plain tensorflow Dataset provided by the bridge.

Dynamic Dataset usage with Nova Example

To use the hcai_datasets repository with Nova you can use the HcaiNovaDynamicIterableclass from the hcai_datasets.hcai_nova_dynamic.hcai_nova_dynamic_iterable module to create an iterator for a specific data configuration. This readme assumes that you are already familiar with the terminology and the general concept of the NOVA annotation tool / database. The constructor of the class takes the following arguments as input:

db_config_path: string path to a configfile with the nova database config. the config file looks like this:

[DB]
ip = 127.0.0.1
port = 37317
user = my_user
password = my_password

db_config_dict: string dictionary with the nova database config. can be used instead of db_config_path. if both are specified db_config_dict is used.

dataset: string the name of the dataset. Same as the entry in the Nova db.

nova_data_dir: string the directory to look for data. same as the directory specified in the nova gui.

sessions: list list of sessions that should be loaded. must match the session names in nova.

annotator: string the name of the annotator that labeld the session. must match annotator names in nova.

schemes: list list of the annotation schemes to fetch.

roles: list list of roles for which the annotation should be loaded.

data_streams: list list datastreams for which the annotation should be loaded. must match stream names in nova.

start: string | int | float start time_ms. use if only a specific chunk of a session should be retrieved. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

end: string | int | float optional end time_ms. use if only a specific chunk of a session should be retrieved. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

left_context: string | int | float additional data to pass to the classifier on the left side of the frame. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

right_context: string | int | float additional data to pass to the classifier on the left side of the frame. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

frame_size: string | int | float the framesize to look at. the matching annotation will be calculated as majority vote from all annotations that are overlapping with the timeframe. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

stride: string | int | float how much a frame is moved to calculate the next sample. equals framesize by default. can be passed as String (e.g. '1s' or '1ms'), Int (interpreted as milliseconds) or Float (interpreted as seconds).

flatten_samples: bool if set to True samples with the same annotation scheme but from different roles will be treated as separate samples. only is used for the keys.

add_rest_class: bool when set to True an additional rest class will be added to the end the label list

from pathlib import Path
from hcai_dataset_utils.bridge_tf import BridgeTensorflow
import tensorflow as tf
from hcai_datasets.hcai_nova_dynamic.hcai_nova_dynamic_iterable import HcaiNovaDynamicIterable

ds_iter = HcaiNovaDynamicIterable(
    db_config_path="./nova_db.cfg",
    db_config_dict=None,
    dataset="affect-net",
    nova_data_dir=Path("./nova/data"),
    sessions=[f"{i}_man_eval" for i in range(8)],
    roles=["session"],
    schemes=["emotion_categorical"],
    annotator="gold",
    data_streams=["video"],
    frame_size=0.04,
    left_context=0,
    right_context=0,
    start = "0s",
    end = "3000ms",
    flatten_samples=False,
)

for sample in ds_iter:
    print(sample)

Pytorch Example

The BridePyTorch module can be used to create a Pytorch DataLoader directly from the Dataset iterable.

from torch.utils.data import DataLoader
from hcai_dataset_utils.bridge_pytorch import BridgePyTorch
from hcai_datasets.hcai_affectnet.hcai_affectnet_iterable import HcaiAffectnetIterable

ds_iter = HcaiAffectnetIterable(
    dataset_dir="path/to/data_sets/AffectNet",
    split="test"
)
dataloader = DataLoader(BridgePyTorch(ds_iter))

for sample in dataloader:
    print(sample)

Tensorflow Example

The BridgeTensorflow module can be used to create a Pytorch DataLoader directly from the Dataset iterable.

from hcai_dataset_utils.bridge_tf import BridgeTensorflow
from hcai_datasets.hcai_affectnet.hcai_affectnet_iterable import HcaiAffectnetIterable

ds_iter = HcaiAffectnetIterable(
    dataset_dir="path/to/data_sets/AffectNet",
    split="test"
)

tf_dataset = BridgeTensorflow.make(ds_iter)

for sample in tf_dataset:
    print(sample)

Tensorflow Dataset API (DEPRECATED)

Example Usage

import os
import tensorflow as tf
import tensorflow_datasets as tfds
import hcai_datasets
from matplotlib import pyplot as plt

# Preprocessing function
def preprocess(x, y):
  img = x.numpy()
  return img, y

# Creating a dataset
ds, ds_info = tfds.load(
  'hcai_example_dataset',
  split='train',
  with_info=True,
  as_supervised=True,
  builder_kwargs={'dataset_dir': os.path.join('path', 'to', 'directory')}
)

# Input output mapping
ds = ds.map(lambda x, y: (tf.py_function(func=preprocess, inp=[x, y], Tout=[tf.float32, tf.int64])))

# Manually iterate over dataset
img, label = next(ds.as_numpy_iterator())

# Visualize
plt.imshow(img / 255.)
plt.show()

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.14.dev202309120700 pre-release

Sep 12, 2023

0.1.14.dev202307261141 pre-release

Jul 26, 2023

0.1.13.dev202307251350 pre-release

Jul 25, 2023

0.1.13.dev202307251024 pre-release

Jul 25, 2023

0.1.13.dev202306220859 pre-release

Jun 22, 2023

0.1.12.dev202306220843 pre-release

Jun 22, 2023

0.1.12.dev202306211227 pre-release

Jun 21, 2023

0.1.11.dev202306211220 pre-release

Jun 21, 2023

0.1.11.dev202306210707 pre-release

Jun 21, 2023

0.1.11.dev202306200814 pre-release

Jun 20, 2023

0.1.11.dev202306121349 pre-release

Jun 12, 2023

0.1.11.dev202306120620 pre-release

Jun 12, 2023

0.1.10.dev202306120619 pre-release

Jun 12, 2023

0.1.9.dev202306061220 pre-release

Jun 6, 2023

0.1.9.dev202306021222 pre-release

Jun 2, 2023

0.1.9.dev202305311557 pre-release

May 31, 2023

0.1.9.dev202305311351 pre-release

May 31, 2023

0.1.9.dev202305261329 pre-release

May 26, 2023

0.1.9.dev202305251000 pre-release

May 25, 2023

0.1.8.dev202305250941 pre-release

May 25, 2023

0.1.8.dev202305250747 pre-release

May 25, 2023

0.1.8.dev202305250743 pre-release

May 25, 2023

0.1.8.dev202305171220 pre-release

May 17, 2023

0.1.7.dev202305171213 pre-release

May 17, 2023

0.1.7.dev202305170919 pre-release

May 17, 2023

0.1.7.dev202305040643 pre-release

May 4, 2023

0.1.7.dev202304261438 pre-release

Apr 26, 2023

0.1.7.dev202304261012 pre-release

Apr 26, 2023

0.1.7.dev202304250944 pre-release

Apr 25, 2023

0.1.7.dev202304142234 pre-release

Apr 14, 2023

0.1.7.dev202304131555 pre-release

Apr 13, 2023

0.1.7.dev202304041046 pre-release

Apr 4, 2023

0.1.7.dev202303211432 pre-release

Mar 21, 2023

0.1.7.dev202303211418 pre-release

Mar 21, 2023

This version

0.1.7.dev202303210944 pre-release

Mar 21, 2023

0.1.7.dev202303171244 pre-release

Mar 17, 2023

0.1.6.dev202303170848 pre-release

Mar 17, 2023

0.1.6.dev202303161413 pre-release

Mar 16, 2023

0.1.6.dev202303151225 pre-release

Mar 15, 2023

0.1.6.dev202303141358 pre-release

Mar 14, 2023

0.1.6.dev202303131457 pre-release

Mar 13, 2023

0.1.6.dev202303131445 pre-release

Mar 13, 2023

0.1.5.dev202303131454 pre-release

Mar 13, 2023

0.1.5.dev202303131435 pre-release

Mar 13, 2023

0.1.5.dev202303131423 pre-release

Mar 13, 2023

0.1.5.dev202303131412 pre-release

Mar 13, 2023

0.1.5.dev202303131405 pre-release

Mar 13, 2023

0.1.5.dev202303131301 pre-release

Mar 13, 2023

0.1.4.dev202303131243 pre-release

Mar 13, 2023

0.1.4.dev202303131236 pre-release

Mar 13, 2023

0.1.4.dev202303131229 pre-release

Mar 13, 2023

0.1.4.dev202303131218 pre-release

Mar 13, 2023

0.1.4.dev202303131204 pre-release

Mar 13, 2023

0.1.4.dev202303131154 pre-release

Mar 13, 2023

0.1.2.dev202206280858 pre-release

Jun 28, 2022

0.1.2.dev202206280833 pre-release

Jun 28, 2022

0.1.1.dev202203101745 pre-release

Mar 10, 2022

0.1.1.dev202203031242 pre-release

Mar 3, 2022

0.1.1.dev202203030942 pre-release

Mar 3, 2022

0.1.1.dev202202091716 pre-release

Feb 9, 2022

0.1.1.dev202202071432 pre-release

Feb 7, 2022

0.1.1.dev202202071431 pre-release

Feb 7, 2022

0.1.1.dev202202071419 pre-release

Feb 7, 2022

0.1.1.dev202202071415 pre-release

Feb 7, 2022

0.1.1.dev202202071414 pre-release

Feb 7, 2022

0.1.1.dev202201281837 pre-release

Jan 28, 2022

0.0.16.dev202201261715 pre-release

Jan 26, 2022

0.0.16.dev202112131000 pre-release

Dec 13, 2021

0.0.16.dev202112101422 pre-release

Dec 10, 2021

0.0.16.dev202112081837 pre-release

Dec 8, 2021

0.0.16.dev202111041317 pre-release

Nov 4, 2021

0.0.16.dev202109141636 pre-release

Sep 14, 2021

0.0.16.dev202108041648 pre-release

Aug 4, 2021

0.0.14.dev202107300955 pre-release

Jul 30, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcai-datasets-nightly-0.1.7.dev202303210944.tar.gz (1.2 MB view details)

Uploaded Mar 21, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hcai_datasets_nightly-0.1.7.dev202303210944-py3-none-any.whl (1.3 MB view details)

Uploaded Mar 21, 2023 Python 3

File details

Details for the file hcai-datasets-nightly-0.1.7.dev202303210944.tar.gz.

File metadata

Download URL: hcai-datasets-nightly-0.1.7.dev202303210944.tar.gz
Upload date: Mar 21, 2023
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for hcai-datasets-nightly-0.1.7.dev202303210944.tar.gz
Algorithm	Hash digest
SHA256	`6a239ff174fd0c7f37658bfb57ba96a1cee5fbc66b13bde218d74b36c443cf54`
MD5	`2ec1df9635cae787c0a0182634703e0f`
BLAKE2b-256	`b6213d0814feac6adf17f7b04b42b0370d07eee9bf920177bee6e0d9a290e862`

See more details on using hashes here.

File details

Details for the file hcai_datasets_nightly-0.1.7.dev202303210944-py3-none-any.whl.

File metadata

Download URL: hcai_datasets_nightly-0.1.7.dev202303210944-py3-none-any.whl
Upload date: Mar 21, 2023
Size: 1.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for hcai_datasets_nightly-0.1.7.dev202303210944-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80c45649b8f5b41b6178f20fd5c2bc75817324afecb7417c7902a5ea78aac92a`
MD5	`e1a5514c2122a80025b2826ca061a0ff`
BLAKE2b-256	`15677db5790107b1f5e5f5826e52af84592509d081cc47b5b2a900c3edf8a96c`

See more details on using hashes here.

hcai-datasets-nightly 0.1.7.dev202303210944

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Description

Installation Information

Currently available Datasets

Architecture

Dynamic Dataset usage with Nova Example

Pytorch Example

Tensorflow Example

Tensorflow Dataset API (DEPRECATED)

Example Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes