A lightweight interchange format for video annotations
Project description
mava-exchange
Authors:
mava-exchange
A Python library and standard package format for exchanging video annotation data between research tools.
Documentation
📚 Full Documentation - Complete user guide and API reference
Quick links:
- Getting Started - Installation and tutorial
- Format Specification - Technical specification
- Interactive Viewer - Try it in your browser
- API Reference - Python class documentation
Project Context
This work is part of the MAVA Project, which aims to improve interoperability and data exchange among three research tools — VideoScope, TIB-AV-A, and VIAN. By standardising data formats and developing tools to import, export, and validate annotation packages, the project enhances data sharing and analysis capabilities across linguistic, media studies, and audiovisual research.
This infrastructure will enable shared research workflows and ensure adherence to FAIR principles, improving the accessibility and reusability of research data.
The Challenge
Video analysis via AI is computationally expensive and produces large datasets — continuous observations such as emotion scores, scene analysis, and audio features accumulate quickly across a corpus. It is therefore desirable to share these results among video processing tools without recomputing them.
What is needed is a common data exchange format and tools to write, read, and validate packages in that format. Packages should be:
- Interoperable and self-describing
- Efficient to write and compact in file size
mava-exchange
mava-exchange addresses this with a standard package format for video
annotation corpora called .mediapkg. The library has two goals:
Format definition — mava-exchange defines the .mediapkg standard: a ZIP
archive containing annotation data as Parquet files alongside a manifest that
maps columns to the MAVA ontology via JSON-LD.
Tooling — mava-exchange provides a Python library and CLI tools to write,
read, inspect, and validate .mediapkg packages.
Design Choices
-
ZIP + Parquet — the
.mediapkgis a ZIP archive containing one Parquet file per annotation track. Parquet offers compact storage, efficient reads, and columnar access. ZIP provides a universal container that any tool can open for inspection. -
JSON-LD manifest — each package contains a
manifest.jsonwith a JSON-LD@contextthat maps Parquet column names to terms in the MAVA ontology. This is the semantic layer — it describes what each column means without being part of the data itself. -
MAVA ontology — the ontology defines a shared vocabulary for annotation tracks, time coordinates, and observation dimensions. SHACL shapes are included for formal validation.
-
Python — the library is implemented in Python, as all participating tools use Python. Support for other languages may be added in future releases.
Inspiration
This format is inspired by GeoJSON and GeoParquet. GeoParquet embeds spatial
metadata inside Parquet files to describe geometry columns. mava-exchange
applies the same principle to temporal data: where GeoParquet uses spatial
coordinates, mava-exchange uses time coordinates on a video timeline. Where
GeoParquet metadata is purely operational, mava-exchange adds a semantic layer
via JSON-LD to link columns to a shared ontology.
Using the library
Install from PyPI:
pip install mava-exchange
Write, read, and validate .mediapkg packages:
from mava_exchange import (
MediaPackageWriter, MediaPackageReader,
ObservationSeries, AnnotationSeries, DimensionSpec,
)
# Define what your tracks mean
emotions = ObservationSeries(
name="emotions",
description="Face emotion scores from DeepFace",
sampling_interval=0.5,
dimensions=[
DimensionSpec("angry", "Anger probability", "[0,1]"),
DimensionSpec("neutral", "Neutral expression", "[0,1]"),
]
)
# Write a package
with MediaPackageWriter("corpus.mediapkg") as writer:
writer.add_video("video_001", "https://example.org/talk.mp4")
writer.add_track("video_001", emotions, emotions_df)
# Read it back
with MediaPackageReader("corpus.mediapkg") as reader:
df = reader.read_track("video_001", "emotions")
Two CLI tools are also available after installation:
mediapkg-inspect corpus.mediapkg
mediapkg-validate corpus.mediapkg
👉 See the full tutorial for a complete walkthrough.
Development
Clone the repository and install in editable mode with development dependencies:
git clone https://github.com/sdsc-ordes/mava-exchange.git
cd mava-exchange
uv sync --group dev
The project uses just as a task runner:
just test # run the test suite
just lint # run ruff
just format # format with ruff and treefmt
just build # build the package
To run the example that converts real TSV annotation files into a .mediapkg
corpus:
just example # create example corpus from TSV files
just inspect # inspect the resulting corpus.mediapkg
just inspect-turtle # view manifest as Turtle RDF
just validate # validate the package
Further Reading
- Format Specification - Complete technical specification
- MAVA Ontology - Semantic vocabulary (interactive)
- User Guide - Writing and reading packages
- CLI Tools - Command-line reference
- Examples - Complete example code
For development:
License
Acknowledgements
This work was funded by the Swiss Data Science Center (SDSC) through its National Call for Projects as an Infrastructure project.
We gratefully acknowledge the contributions of the SDSC experts and our partners.
SDSC Experts:
- Dr. Stefan Milosavljevic, ORCID ID 0000-0002-9135-1353
- Sabine Maennel, ORCID ID 0009-0001-3022-8239
- Robin Franken, ORCID ID 0009-0008-0143-9118
- Dr. Oksana Riba Grognuz, ORCID ID 0000-0002-2961-2655
Partner Institutions
- Dr. Teodora Vuković, ORCID ID 0000-0002-5780-5665
- Dr. Jeremy Zehr, ORCID ID 0000-0002-6046-8647
- Prof. Dr. Josephine Diecke, ORCID ID 0000-0002-9342-0631
- Dr. Simon Spiegel, ORCID ID 0000-0003-2141-5566
- Prof. Dr. Ralph Ewerth, ORCID ID 0000-0003-0918-6297
- Dr. Eric Müller-Budack, ORCID ID 0000-0002-6802-1241
- Dr. Cristina Grisot, ORCID ID 0000-0003-0684-4442
How to Cite
If you use this software, please cite it as follows:
👉 See the CITATION.cff file for the full list of software authors and citation formats.
When referring to the project more broadly (including partner contributions), please acknowledge the funding statement and collaborators listed in the Acknowledgements section:
"This work was funded by the Swiss Data Science Center (SDSC) through its National Call for Projects as an Infrastructure project."
Copyright
Copyright © 2025-2026 Swiss Data Science Center (SDSC),www.datascience.ch, ROR: ror.org/02hdt9m26. All rights reserved. The SDSC is a Swiss National Research Infrastructure, jointly established and legally represented by the École Polytechnique Fédérale de Lausanne (EPFL) and the Eidgenössische Technische Hochschule Zürich (ETH Zürich) as a société simple. This copyright encompasses all materials, software, documentation, and other content created and developed by the SDSC.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mava_exchange-0.1.0.tar.gz.
File metadata
- Download URL: mava_exchange-0.1.0.tar.gz
- Upload date:
- Size: 170.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dcdaeba518bd012c695bf342917c8dbf15aea8df328f298c4278a9641331f0b
|
|
| MD5 |
8b1f4c70a8fd1a5e1c3ac28016440ddf
|
|
| BLAKE2b-256 |
09b45c3471a0812048f1a4462988a84d9c0100d32be795e3e6eb6a3384e15841
|
File details
Details for the file mava_exchange-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mava_exchange-0.1.0-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94cb05e93b62dc91c4b912e9c14feead0b9a634a5adaaea76b4aa6aac92f4643
|
|
| MD5 |
cce0039f2ad50931a35b6851b1337832
|
|
| BLAKE2b-256 |
d1b5f202975b30ba15e7a9c6f177b7521477f70c90c207502d9b98ff624ce250
|