This repository contains utility functions and interfaces that can be used to interact with the DISCOVER framework.
Project description
DISCOVER-Utils
DISCOVER-Utils is a Python utility package for data handling, processing, and annotation of multimedia data. It is designed to work with the DISCOVER framework or as a stand-alone library.
Features
- Data handling — Unified access to streams (audio, video, sensor data) and annotations (discrete, continuous) via file, MongoDB, or URL backends
- Multiple video backends — Choose between decord, imageio, moviepy, or pyav for video decoding
- Dataset management — Iterate over multi-session datasets with
DatasetManagerandDatasetIterator - Processing pipeline — Run DISCOVER server modules from the command line for feature extraction and prediction
- SSI compatibility — Read and write SSI trainer files and XML configurations
Installation
pip install hcai-discover-utils
Optional video backends
# Fast video decoding with decord
pip install hcai-discover-utils[decord]
# PyAV (FFmpeg bindings)
pip install hcai-discover-utils[pyav]
# MoviePy
pip install hcai-discover-utils[pymovie]
Getting Started
Command-line tools
Process data with DISCOVER server modules:
du-process \
--dataset "my_dataset" \
--db_host "127.0.0.1" --db_port "27017" \
--db_user "user" --db_password "pass" \
--trainer_file_path "path/to/trainer.trainer" \
--sessions '["session1", "session2"]' \
--data '[{"src": "db:anno", "scheme": "transcript", "annotator": "test", "role": "testrole"}]'
File mode (no database)
Read inputs and write outputs directly from/to disk, without a NOVA database. Use file: sources and supply a path via uri (static, single session) or uri_template (per-session paths via {dataset} and {session} placeholders):
du-process \
--dataset "my_study" \
--trainer_file_path "path/to/trainer.trainer" \
--sessions '["session_a", "session_b"]' \
--data '[
{
"id": "video",
"type": "input",
"src": "file:stream:video",
"uri_template": "/data/{dataset}/{session}/video.mp4"
},
{
"id": "valence",
"type": "output",
"src": "file:annotation:continuous",
"uri_template": "/outputs/{dataset}/{session}/valence.annotation",
"sample_rate": 30,
"min_val": -1,
"max_val": 1
}
]'
Each session resolves its own input and output paths. Output annotation descriptors may carry scheme metadata that is used when no annotation file exists yet:
-
file:annotation:continuous:sample_rate,min_val,max_val(defaults:1,0,1). -
file:annotation:discrete:classesas a map from class id to a dict of per-class XML attributes (typicallyname, optionallycolor, etc.). The outer key is the canonical id; the writer injects it into the XML automatically. For example:// fragment of a data description entry "classes": { "0": {"name": "neutral", "color": "#888"}, "1": {"name": "happiness", "color": "#ffd700"} }Legacy
{id: name}strings are also accepted and normalized internally to the canonical form.
This matters for modules that resample continuous outputs to the scheme's sample_rate — without explicit metadata, outputs default to 1 Hz.
Notes:
urianduri_templateare filesystem paths (absolute or relative to the working directory). There is no implicit base directory.uri_templateplaceholders that reference{dataset}or{session}must have non-empty values; otherwiseresolve_file_uriraisesValueError.uri_templatetakes precedence overuriwhen both are present.
Python API
from discover_utils.data.provider.data_manager import DatasetManager
# Set up a dataset manager for your sessions
dm = DatasetManager(
dataset="my_dataset",
db_host="127.0.0.1",
db_port=27017,
db_user="user",
db_password="pass",
sessions=["session1"],
data_description=[...],
)
Documentation
Full API documentation is available at hcmlab.github.io/discover-utils/docbuild/.
Citation
If you use DISCOVER or DISCOVER-Utils in your research, please cite:
@article{hallmen2025discover,
title = {DISCOVER: a Data-driven Interactive System for Comprehensive
Observation, Visualization, and ExploRation of human behavior},
author = {Hallmen, Tobias and Schiller, Dominik and others},
journal = {Frontiers in Digital Health},
volume = {7},
pages = {1638539},
year = {2025},
publisher = {Frontiers}
}
License
This project is licensed under the GNU General Public License v3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hcai_discover_utils-1.1.1.tar.gz.
File metadata
- Download URL: hcai_discover_utils-1.1.1.tar.gz
- Upload date:
- Size: 81.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca728c094b30a038e998852eb75eebb86b001aae4a8eb24f66b0c3bbf75c5d81
|
|
| MD5 |
44f6fc0f80180858aece189d324f4a43
|
|
| BLAKE2b-256 |
f4f27757589cdf85ae4fba3f13babe4cbe56066135a5d319d52a5a384b501015
|
File details
Details for the file hcai_discover_utils-1.1.1-py3-none-any.whl.
File metadata
- Download URL: hcai_discover_utils-1.1.1-py3-none-any.whl
- Upload date:
- Size: 90.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaf722a73e2018bd8e04747a82fc84ab5d0ed1e29404dd40cbd394924755f2c8
|
|
| MD5 |
59a1bbec9542ae9550a7a5011762e4ee
|
|
| BLAKE2b-256 |
7700d30c16a13984acf79aa452db848cc4409f29adb6f69b9577745a4d2b8ac0
|