A Python package for accessing VGGSounder dataset labels and metadata
Project description
VGGSounder: Audio-Visual Evaluations for Foundation Models
If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏
📰 News
- [11.06.2025] 📃 Released technical report of VGGSounder. Contains detailed discussion on how we built the first multimodal benchmark for video tagging with complete per-modality annotations for every class.
🌟 Introduction
VGGSounder is a re-annotated benchmark built upon the VGGSound dataset, designed to rigorously evaluate audio-visual foundation models and understand how they utilize modalities. VGGSounder introduces:
- 🔍 Per-label modality tags (audible / visible / both) for all classes in the sample
- 🎵 Meta labels for background music, voice-over, and static images
- 📊 Multiple classes per one sample
🚀 Installation
The VGGSounder dataset is now available as a Python package! Install it via pip:
pip install vggsounder
Or install from source using uv:
git clone https://github.com/bizilizi/vggsounder.git
cd vggsounder
uv build
pip install dist/vggsounder-*.whl
🐍 Python Package Usage
Quick Start
import vggsounder
# Load the dataset
labels = vggsounder.VGGSounder()
# Access video data by ID
video_data = labels["--U7joUcTCo_000000"]
print(video_data.labels) # List of labels for this video
print(video_data.meta_labels) # Metadata (background_music, static_image, voice_over)
print(video_data.modalities) # Modality for each label (A, V, AV)
# Get dataset statistics
stats = labels.stats()
print(f"Total videos: {stats['total_videos']}")
print(f"Unique labels: {stats['unique_labels']}")
# Search functionality
piano_videos = labels.get_videos_with_labels("playing piano")
voice_over_videos = labels.get_videos_with_meta(voice_over=True)
Advanced Usage
# Dict-like interface
print(len(labels)) # Number of videos
print("video_id" in labels) # Check if video exists
for video_id in labels: # Iterate over video IDs
video_data = labels[video_id]
# Get all unique labels
all_labels = labels.get_all_labels()
# Complex queries
static_speech_videos = labels.get_videos_with_meta(
static_image=True, voice_over=True
)
🏷️ Label Format
VGGSounder annotations are stored in a CSV file located at data/vggsounder.csv. Each row corresponds to a single label for a specific video sample. The dataset supports multi-label, multi-modal classification with additional meta-information for robust evaluation.
Columns
video_id: Unique identifier for a 10-second video clip.label: Human-readable label representing a sound or visual category (e.g.male singing,playing timpani).modality: The modality in which the label is perceivable:A= AudibleV= VisibleAV= Both audible and visible
background_music:Trueif the video contains background music.static_image:Trueif the video consists of a static image.voice_over:Trueif the video contains voice-over narration.
Example
| video_id | label | modality | background_music | static_image | voice_over |
|---|---|---|---|---|---|
---g-f_I2yQ_000001 |
male singing |
A | True | False | False |
---g-f_I2yQ_000001 |
people crowd |
AV | True | False | False |
---g-f_I2yQ_000001 |
playing timpani |
A | True | False | False |
📦 Publishing to PyPI
To publish this package to PyPI:
-
Prepare your environment:
# Install uv if you haven't already curl -LsSf https://astral.sh/uv/install.sh | sh
-
Build the package:
uv build -
Set up PyPI credentials:
- Create a PyPI account at https://pypi.org
- Generate an API token in your PyPI account settings
- Set the token:
export UV_PUBLISH_TOKEN=your_pypi_token
-
Publish to PyPI:
# Test on Test PyPI first (recommended) uv publish --index-url https://test.pypi.org/legacy/ # Then publish to main PyPI uv publish
For more details, see the uv publishing guide.
📑 Citation
If you find VGGSounder useful for your research and applications, please consider citing us using this BibTeX:
@article{zverevwiedemer2025vggsounder,
author = {Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke},
title = {VGGSounder: Audio-Visual Evaluations for Foundation Models},
year = {2025},
}
❤️ Acknowledgement
The authors would like to thank Felix Förster, Sayak Mallick, and Prasanna Mayilvahananan for their help with data annotation, as well as Thomas Klein and Shyamgopal Karthik for their help in setting up MTurk. They also thank numerous MTurk workers for labelling. This work was in part supported by the BMBF (FKZ: 01IS24060, 01I524085B), the DFG (SFB 1233, TP A1, project number: 276693517), and the Open Philanthropy Foundation funded by the Good Ventures Foundation. The authors thank the IMPRS-IS for supporting TW.
👮 License
This project is released under the Apache 2.0 license as found in the LICENSE file. Please get in touch with us if you find any potential violations.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vggsounder-0.1.1.tar.gz.
File metadata
- Download URL: vggsounder-0.1.1.tar.gz
- Upload date:
- Size: 316.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03afbb1ef02bd9114429cf482a1b22298cd23d07a3ca7ec065dcd51ca9cd603d
|
|
| MD5 |
7327788d957439e6982bea8c17de5c85
|
|
| BLAKE2b-256 |
2e730e817ed2cf6ae40a9a7925783227d13a2885166844945183efef591f46ca
|
File details
Details for the file vggsounder-0.1.1-py3-none-any.whl.
File metadata
- Download URL: vggsounder-0.1.1-py3-none-any.whl
- Upload date:
- Size: 321.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f3a5507075b59d1c22cb692ddbe3f27362fa4e877b72d99eaaece5b64d0c3a3
|
|
| MD5 |
bed96481a228c5b2f370519b80a8bdf7
|
|
| BLAKE2b-256 |
085c9149e80c0efe4c5262cff269ebc44cc73498b65e2f337098126ac314d791
|