Skip to main content

This repository contains utility functions and interfaces that can be used to interact with the DISCOVER framework.

Project description

DISCOVER-Utils

PyPI version Python License: GPL v3 Documentation

DISCOVER-Utils is a Python utility package for data handling, processing, and annotation of multimedia data. It is designed to work with the DISCOVER framework or as a stand-alone library.

Features

  • Data handling — Unified access to streams (audio, video, sensor data) and annotations (discrete, continuous) via file, MongoDB, or URL backends
  • Multiple video backends — Choose between decord, imageio, moviepy, or pyav for video decoding
  • Dataset management — Iterate over multi-session datasets with DatasetManager and DatasetIterator
  • Processing pipeline — Run DISCOVER server modules from the command line for feature extraction and prediction
  • SSI compatibility — Read and write SSI trainer files and XML configurations

Installation

pip install hcai-discover-utils

Optional video backends

# Fast video decoding with decord
pip install hcai-discover-utils[decord]

# PyAV (FFmpeg bindings)
pip install hcai-discover-utils[pyav]

# MoviePy
pip install hcai-discover-utils[pymovie]

Getting Started

Command-line tools

Process data with DISCOVER server modules:

du-process \
  --dataset "my_dataset" \
  --db_host "127.0.0.1" --db_port "27017" \
  --db_user "user" --db_password "pass" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session1", "session2"]' \
  --data '[{"src": "db:anno", "scheme": "transcript", "annotator": "test", "role": "testrole"}]'

File mode (no database)

Read inputs and write outputs directly from/to disk, without a NOVA database. Use file: sources and supply a path via uri (static, single session) or uri_template (per-session paths via {dataset} and {session} placeholders):

du-process \
  --dataset "my_study" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session_a", "session_b"]' \
  --data '[
    {
      "id": "video",
      "type": "input",
      "src": "file:stream:video",
      "uri_template": "/data/{dataset}/{session}/video.mp4"
    },
    {
      "id": "valence",
      "type": "output",
      "src": "file:annotation:continuous",
      "uri_template": "/outputs/{dataset}/{session}/valence.annotation",
      "sample_rate": 30,
      "min_val": -1,
      "max_val": 1
    }
  ]'

Each session resolves its own input and output paths. Output annotation descriptors may carry scheme metadata that is used when no annotation file exists yet:

  • file:annotation:continuous: sample_rate, min_val, max_val (defaults: 1, 0, 1).

  • file:annotation:discrete: classes as a map from class id to a dict of per-class XML attributes (typically name, optionally color, etc.). The outer key is the canonical id; the writer injects it into the XML automatically. For example:

    // fragment of a data description entry
    "classes": {
      "0": {"name": "neutral", "color": "#888"},
      "1": {"name": "happiness", "color": "#ffd700"}
    }
    

    Legacy {id: name} strings are also accepted and normalized internally to the canonical form.

This matters for modules that resample continuous outputs to the scheme's sample_rate — without explicit metadata, outputs default to 1 Hz.

Notes:

  • uri and uri_template are filesystem paths (absolute or relative to the working directory). There is no implicit base directory.
  • uri_template placeholders that reference {dataset} or {session} must have non-empty values; otherwise resolve_file_uri raises ValueError.
  • uri_template takes precedence over uri when both are present.

Python API

from discover_utils.data.provider.data_manager import DatasetManager

# Set up a dataset manager for your sessions
dm = DatasetManager(
    dataset="my_dataset",
    db_host="127.0.0.1",
    db_port=27017,
    db_user="user",
    db_password="pass",
    sessions=["session1"],
    data_description=[...],
)

Documentation

Full API documentation is available at hcmlab.github.io/discover-utils/docbuild/.

Citation

If you use DISCOVER or DISCOVER-Utils in your research, please cite:

@article{hallmen2025discover,
  title     = {DISCOVER: a Data-driven Interactive System for Comprehensive
               Observation, Visualization, and ExploRation of human behavior},
  author    = {Hallmen, Tobias and Schiller, Dominik and others},
  journal   = {Frontiers in Digital Health},
  volume    = {7},
  pages     = {1638539},
  year      = {2025},
  publisher = {Frontiers}
}

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcai_discover_utils-1.1.1.tar.gz (81.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hcai_discover_utils-1.1.1-py3-none-any.whl (90.2 kB view details)

Uploaded Python 3

File details

Details for the file hcai_discover_utils-1.1.1.tar.gz.

File metadata

  • Download URL: hcai_discover_utils-1.1.1.tar.gz
  • Upload date:
  • Size: 81.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for hcai_discover_utils-1.1.1.tar.gz
Algorithm Hash digest
SHA256 ca728c094b30a038e998852eb75eebb86b001aae4a8eb24f66b0c3bbf75c5d81
MD5 44f6fc0f80180858aece189d324f4a43
BLAKE2b-256 f4f27757589cdf85ae4fba3f13babe4cbe56066135a5d319d52a5a384b501015

See more details on using hashes here.

File details

Details for the file hcai_discover_utils-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hcai_discover_utils-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 eaf722a73e2018bd8e04747a82fc84ab5d0ed1e29404dd40cbd394924755f2c8
MD5 59a1bbec9542ae9550a7a5011762e4ee
BLAKE2b-256 7700d30c16a13984acf79aa452db848cc4409f29adb6f69b9577745a4d2b8ac0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page