Skip to main content

This repository contains utility functions and interfaces that can be used to interact with the DISCOVER framework.

Project description

DISCOVER-Utils

PyPI version Python License: GPL v3 Documentation

DISCOVER-Utils is a Python utility package for data handling, processing, and annotation of multimedia data. It is designed to work with the DISCOVER framework or as a stand-alone library.

Features

  • Data handling — Unified access to streams (audio, video, sensor data) and annotations (discrete, continuous) via file, MongoDB, or URL backends
  • Multiple video backends — Choose between decord, imageio, moviepy, or pyav for video decoding
  • Dataset management — Iterate over multi-session datasets with DatasetManager and DatasetIterator
  • Processing pipeline — Run DISCOVER server modules from the command line for feature extraction and prediction
  • SSI compatibility — Read and write SSI trainer files and XML configurations

Installation

pip install hcai-discover-utils

Optional video backends

# Fast video decoding with decord
pip install hcai-discover-utils[decord]

# PyAV (FFmpeg bindings)
pip install hcai-discover-utils[pyav]

# MoviePy
pip install hcai-discover-utils[pymovie]

Getting Started

Command-line tools

Process data with DISCOVER server modules:

du-process \
  --dataset "my_dataset" \
  --db_host "127.0.0.1" --db_port "27017" \
  --db_user "user" --db_password "pass" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session1", "session2"]' \
  --data '[{"src": "db:anno", "scheme": "transcript", "annotator": "test", "role": "testrole"}]'

File mode (no database)

Read inputs and write outputs directly from/to disk, without a NOVA database. Use file: sources and supply a path via uri (static, single session) or uri_template (per-session paths via {dataset} and {session} placeholders):

du-process \
  --dataset "my_study" \
  --trainer_file_path "path/to/trainer.trainer" \
  --sessions '["session_a", "session_b"]' \
  --data '[
    {
      "id": "video",
      "type": "input",
      "src": "file:stream:video",
      "uri_template": "/data/{dataset}/{session}/video.mp4"
    },
    {
      "id": "valence",
      "type": "output",
      "src": "file:annotation:continuous",
      "uri_template": "/outputs/{dataset}/{session}/valence.annotation",
      "sample_rate": 30,
      "min_val": -1,
      "max_val": 1
    }
  ]'

Each session resolves its own input and output paths. Output annotation descriptors may carry scheme metadata that is used when no annotation file exists yet:

  • file:annotation:continuous: sample_rate, min_val, max_val (defaults: 1, 0, 1).

  • file:annotation:discrete: classes as a map from class id to a dict of per-class XML attributes (typically name, optionally color, etc.). The outer key is the canonical id; the writer injects it into the XML automatically. For example:

    // fragment of a data description entry
    "classes": {
      "0": {"name": "neutral", "color": "#888"},
      "1": {"name": "happiness", "color": "#ffd700"}
    }
    

    Legacy {id: name} strings are also accepted and normalized internally to the canonical form.

This matters for modules that resample continuous outputs to the scheme's sample_rate — without explicit metadata, outputs default to 1 Hz.

Notes:

  • uri and uri_template are filesystem paths (absolute or relative to the working directory). There is no implicit base directory.
  • uri_template placeholders that reference {dataset} or {session} must have non-empty values; otherwise resolve_file_uri raises ValueError.
  • uri_template takes precedence over uri when both are present.

Python API

from discover_utils.data.provider.data_manager import DatasetManager

# Set up a dataset manager for your sessions
dm = DatasetManager(
    dataset="my_dataset",
    db_host="127.0.0.1",
    db_port=27017,
    db_user="user",
    db_password="pass",
    sessions=["session1"],
    data_description=[...],
)

Documentation

Full API documentation is available at hcmlab.github.io/discover-utils/docbuild/.

Citation

If you use DISCOVER or DISCOVER-Utils in your research, please cite:

@article{hallmen2025discover,
  title     = {DISCOVER: a Data-driven Interactive System for Comprehensive
               Observation, Visualization, and ExploRation of human behavior},
  author    = {Hallmen, Tobias and Schiller, Dominik and others},
  journal   = {Frontiers in Digital Health},
  volume    = {7},
  pages     = {1638539},
  year      = {2025},
  publisher = {Frontiers}
}

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hcai_discover_utils-1.0.14.tar.gz (81.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hcai_discover_utils-1.0.14-py3-none-any.whl (90.0 kB view details)

Uploaded Python 3

File details

Details for the file hcai_discover_utils-1.0.14.tar.gz.

File metadata

  • Download URL: hcai_discover_utils-1.0.14.tar.gz
  • Upload date:
  • Size: 81.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for hcai_discover_utils-1.0.14.tar.gz
Algorithm Hash digest
SHA256 2ee62cc95c4a627fbf88f9f0db89c8adbc829a9f6d1d5db519d1c5c1f9bd859a
MD5 55a86402d37ac5612461cf94aaca15a1
BLAKE2b-256 d6814f908c43bd4bb93b058236ceaa65a47e470697888eca16d05949f4a1eff0

See more details on using hashes here.

File details

Details for the file hcai_discover_utils-1.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for hcai_discover_utils-1.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 15abb8c000d69550dd42d03e0ec3a0ff70bf473d904b05047f5a9d73a0202788
MD5 c961acde295087b5ee2fe8ded6a3c3ec
BLAKE2b-256 6043675aa5443ac8ab68803b8fec3e54375285a57923118425a75104380f866c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page