Skip to main content

A lightweight interchange format for video annotations

Project description

mava-exchange

Current Release Pipeline Status License

Authors:

mava-exchange

A Python library and standard package format for exchanging video annotation data between research tools.

Documentation

📚 Full Documentation - Complete user guide and API reference

Quick links:

Project Context

This work is part of the MAVA Project, which aims to improve interoperability and data exchange among three research tools — VideoScope, TIB-AV-A, and VIAN. By standardising data formats and developing tools to import, export, and validate annotation packages, the project enhances data sharing and analysis capabilities across linguistic, media studies, and audiovisual research.

This infrastructure will enable shared research workflows and ensure adherence to FAIR principles, improving the accessibility and reusability of research data.

The Challenge

Video analysis via AI is computationally expensive and produces large datasets — continuous observations such as emotion scores, scene analysis, and audio features accumulate quickly across a corpus. It is therefore desirable to share these results among video processing tools without recomputing them.

What is needed is a common data exchange format and tools to write, read, and validate packages in that format. Packages should be:

  • Interoperable and self-describing
  • Efficient to write and compact in file size

mava-exchange

mava-exchange addresses this with a standard package format for video annotation corpora called .mediapkg. The library has two goals:

Format definitionmava-exchange defines the .mediapkg standard: a ZIP archive containing annotation data as Parquet files alongside a manifest that maps columns to the MAVA ontology via JSON-LD.

Toolingmava-exchange provides a Python library and CLI tools to write, read, inspect, and validate .mediapkg packages.

Design Choices

  • ZIP + Parquet — the .mediapkg is a ZIP archive containing one Parquet file per annotation track. Parquet offers compact storage, efficient reads, and columnar access. ZIP provides a universal container that any tool can open for inspection.

  • JSON-LD manifest — each package contains a manifest.json with a JSON-LD @context that maps Parquet column names to terms in the MAVA ontology. This is the semantic layer — it describes what each column means without being part of the data itself.

  • MAVA ontology — the ontology defines a shared vocabulary for annotation tracks, time coordinates, and observation dimensions. SHACL shapes are included for formal validation.

  • Python — the library is implemented in Python, as all participating tools use Python. Support for other languages may be added in future releases.

Inspiration

This format is inspired by GeoJSON and GeoParquet. GeoParquet embeds spatial metadata inside Parquet files to describe geometry columns. mava-exchange applies the same principle to temporal data: where GeoParquet uses spatial coordinates, mava-exchange uses time coordinates on a video timeline. Where GeoParquet metadata is purely operational, mava-exchange adds a semantic layer via JSON-LD to link columns to a shared ontology.

Using the library

Install from PyPI:

pip install mava-exchange

Write, read, and validate .mediapkg packages:

from mava_exchange import (
    MediaPackageWriter, MediaPackageReader,
    ObservationSeries, AnnotationSeries, DimensionSpec,
)

# Define what your tracks mean
emotions = ObservationSeries(
    name="emotions",
    description="Face emotion scores from DeepFace",
    sampling_interval=0.5,
    dimensions=[
        DimensionSpec("angry",   "Anger probability",  "[0,1]"),
        DimensionSpec("neutral", "Neutral expression", "[0,1]"),
    ]
)

# Write a package
with MediaPackageWriter("corpus.mediapkg") as writer:
    writer.add_video("video_001", "https://example.org/talk.mp4")
    writer.add_track("video_001", emotions, emotions_df)

# Read it back
with MediaPackageReader("corpus.mediapkg") as reader:
    df = reader.read_track("video_001", "emotions")

Two CLI tools are also available after installation:

mediapkg-inspect  corpus.mediapkg
mediapkg-validate corpus.mediapkg

👉 See the full tutorial for a complete walkthrough.

Development

Clone the repository and install in editable mode with development dependencies:

git clone https://github.com/sdsc-ordes/mava-exchange.git
cd mava-exchange
uv sync --group dev

The project uses just as a task runner:

just test      # run the test suite
just lint      # run ruff
just format    # format with ruff and treefmt
just build     # build the package

To run the example that converts real TSV annotation files into a .mediapkg corpus:

just example           # create example corpus from TSV files
just inspect           # inspect the resulting corpus.mediapkg
just inspect-turtle    # view manifest as Turtle RDF
just validate          # validate the package

Further Reading

For development:

License

Apache-2.0

Acknowledgements

This work was funded by the Swiss Data Science Center (SDSC) through its National Call for Projects as an Infrastructure project.

We gratefully acknowledge the contributions of the SDSC experts and our partners.

SDSC Experts:

Partner Institutions

How to Cite

If you use this software, please cite it as follows:

👉 See the CITATION.cff file for the full list of software authors and citation formats.

When referring to the project more broadly (including partner contributions), please acknowledge the funding statement and collaborators listed in the Acknowledgements section:

"This work was funded by the Swiss Data Science Center (SDSC) through its National Call for Projects as an Infrastructure project."

Copyright

Copyright © 2025-2026 Swiss Data Science Center (SDSC),www.datascience.ch, ROR: ror.org/02hdt9m26. All rights reserved. The SDSC is a Swiss National Research Infrastructure, jointly established and legally represented by the École Polytechnique Fédérale de Lausanne (EPFL) and the Eidgenössische Technische Hochschule Zürich (ETH Zürich) as a société simple. This copyright encompasses all materials, software, documentation, and other content created and developed by the SDSC.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mava_exchange-0.1.0.tar.gz (170.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mava_exchange-0.1.0-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file mava_exchange-0.1.0.tar.gz.

File metadata

  • Download URL: mava_exchange-0.1.0.tar.gz
  • Upload date:
  • Size: 170.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mava_exchange-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7dcdaeba518bd012c695bf342917c8dbf15aea8df328f298c4278a9641331f0b
MD5 8b1f4c70a8fd1a5e1c3ac28016440ddf
BLAKE2b-256 09b45c3471a0812048f1a4462988a84d9c0100d32be795e3e6eb6a3384e15841

See more details on using hashes here.

File details

Details for the file mava_exchange-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mava_exchange-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mava_exchange-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94cb05e93b62dc91c4b912e9c14feead0b9a634a5adaaea76b4aa6aac92f4643
MD5 cce0039f2ad50931a35b6851b1337832
BLAKE2b-256 d1b5f202975b30ba15e7a9c6f177b7521477f70c90c207502d9b98ff624ce250

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page