A Python package for accessing VGGSounder dataset labels and metadata

These details have not been verified by PyPI

Project links

Project description

VGGSounder: Audio-Visual Evaluations for Foundation Models

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

[11.06.2025] 📃 Released technical report of VGGSounder. Contains detailed discussion on how we built the first multimodal benchmark for video tagging with complete per-modality annotations for every class.

🌟 Introduction

VGGSounder is a re-annotated benchmark built upon the VGGSound dataset, designed to rigorously evaluate audio-visual foundation models and understand how they utilize modalities. VGGSounder introduces:

🔍 Per-label modality tags (audible / visible / both) for all classes in the sample
🎵 Meta labels for background music, voice-over, and static images
📊 Multiple classes per one sample

🚀 Installation

The VGGSounder dataset is now available as a Python package! Install it via pip:

pip install vggsounder

Or install from source using uv:

git clone https://github.com/bizilizi/vggsounder.git
cd vggsounder
uv build
pip install dist/vggsounder-*.whl

🐍 Python Package Usage

Quick Start

import vggsounder

# Load the dataset
labels = vggsounder.VGGSounder()

# Access video data by ID
video_data = labels["--U7joUcTCo_000000"]
print(video_data.labels)        # List of labels for this video
print(video_data.meta_labels)   # Metadata (background_music, static_image, voice_over)
print(video_data.modalities)    # Modality for each label (A, V, AV)

# Get dataset statistics
stats = labels.stats()
print(f"Total videos: {stats['total_videos']}")
print(f"Unique labels: {stats['unique_labels']}")

# Search functionality
piano_videos = labels.get_videos_with_labels("playing piano")
voice_over_videos = labels.get_videos_with_meta(voice_over=True)

Advanced Usage

# Dict-like interface
print(len(labels))                    # Number of videos
print("video_id" in labels)           # Check if video exists
for video_id in labels:               # Iterate over video IDs
    video_data = labels[video_id]

# Get all unique labels
all_labels = labels.get_all_labels()

# Complex queries
static_speech_videos = labels.get_videos_with_meta(
    static_image=True, voice_over=True
)

🏷️ Label Format

VGGSounder annotations are stored in a CSV file located at data/vggsounder.csv. Each row corresponds to a single label for a specific video sample. The dataset supports multi-label, multi-modal classification with additional meta-information for robust evaluation.

Columns

video_id: Unique identifier for a 10-second video clip.
label: Human-readable label representing a sound or visual category (e.g. male singing, playing timpani).
modality: The modality in which the label is perceivable:
- A = Audible
- V = Visible
- AV = Both audible and visible
background_music: True if the video contains background music.
static_image: True if the video consists of a static image.
voice_over: True if the video contains voice-over narration.

Example

video_id	label	modality	background_music	static_image	voice_over
`---g-f_I2yQ_000001`	`male singing`	A	True	False	False
`---g-f_I2yQ_000001`	`people crowd`	AV	True	False	False
`---g-f_I2yQ_000001`	`playing timpani`	A	True	False	False

📦 Publishing to PyPI

To publish this package to PyPI:

Prepare your environment:

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

Build the package:
```
uv build
```
Set up PyPI credentials:
- Create a PyPI account at https://pypi.org
- Generate an API token in your PyPI account settings
- Set the token: export UV_PUBLISH_TOKEN=your_pypi_token

Publish to PyPI:

# Test on Test PyPI first (recommended)
uv publish --index-url https://test.pypi.org/legacy/

# Then publish to main PyPI
uv publish

For more details, see the uv publishing guide.

📑 Citation

If you find VGGSounder useful for your research and applications, please consider citing us using this BibTeX:

@article{zverevwiedemer2025vggsounder,
  author    = {Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke},
  title     = {VGGSounder: Audio-Visual Evaluations for Foundation Models},
  year      = {2025},
}

❤️ Acknowledgement

The authors would like to thank Felix Förster, Sayak Mallick, and Prasanna Mayilvahananan for their help with data annotation, as well as Thomas Klein and Shyamgopal Karthik for their help in setting up MTurk. They also thank numerous MTurk workers for labelling. This work was in part supported by the BMBF (FKZ: 01IS24060, 01I524085B), the DFG (SFB 1233, TP A1, project number: 276693517), and the Open Philanthropy Foundation funded by the Good Ventures Foundation. The authors thank the IMPRS-IS for supporting TW.

👮 License

This project is released under the Apache 2.0 license as found in the LICENSE file. Please get in touch with us if you find any potential violations.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Jan 22, 2026

0.1.5

Oct 8, 2025

0.1.4.1

Sep 18, 2025

0.1.4

Sep 15, 2025

0.1.3

Aug 11, 2025

0.1.2

Aug 7, 2025

This version

0.1.1

Aug 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vggsounder-0.1.1.tar.gz (316.7 kB view details)

Uploaded Aug 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vggsounder-0.1.1-py3-none-any.whl (321.6 kB view details)

Uploaded Aug 7, 2025 Python 3

File details

Details for the file vggsounder-0.1.1.tar.gz.

File metadata

Download URL: vggsounder-0.1.1.tar.gz
Upload date: Aug 7, 2025
Size: 316.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.21

File hashes

Hashes for vggsounder-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`03afbb1ef02bd9114429cf482a1b22298cd23d07a3ca7ec065dcd51ca9cd603d`
MD5	`7327788d957439e6982bea8c17de5c85`
BLAKE2b-256	`2e730e817ed2cf6ae40a9a7925783227d13a2885166844945183efef591f46ca`

See more details on using hashes here.

File details

Details for the file vggsounder-0.1.1-py3-none-any.whl.

File metadata

Download URL: vggsounder-0.1.1-py3-none-any.whl
Upload date: Aug 7, 2025
Size: 321.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.21

File hashes

Hashes for vggsounder-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f3a5507075b59d1c22cb692ddbe3f27362fa4e877b72d99eaaece5b64d0c3a3`
MD5	`bed96481a228c5b2f370519b80a8bdf7`
BLAKE2b-256	`085c9149e80c0efe4c5262cff269ebc44cc73498b65e2f337098126ac314d791`

See more details on using hashes here.

vggsounder 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VGGSounder: Audio-Visual Evaluations for Foundation Models

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

📰 News

🌟 Introduction

🚀 Installation

🐍 Python Package Usage

Quick Start

Advanced Usage

🏷️ Label Format

Columns

Example

📦 Publishing to PyPI

📑 Citation

❤️ Acknowledgement

👮 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes