Skip to main content

A Python package for accessing VGGSounder dataset labels and metadata

Project description

VGGSounder: Audio-Visual Evaluations for Foundation Models

If our project helps you, please give us a star ⭐ on GitHub to support us. 🙏🙏

Project page

License Badge GitHub issues GitHub closed issues

📰 News

  • [11.06.2025] 📃 Released technical report of VGGSounder. Contains detailed discussion on how we built the first multimodal benchmark for video tagging with complete per-modality annotations for every class.

🌟 Introduction

VGGSounder is a re-annotated benchmark built upon the VGGSound dataset, designed to rigorously evaluate audio-visual foundation models and understand how they utilize modalities. VGGSounder introduces:

  • 🔍 Per-label modality tags (audible / visible / both) for all classes in the sample
  • 🎵 Meta labels for background music, voice-over, and static images
  • 📊 Multiple classes per one sample

🚀 Installation

The VGGSounder dataset is now available as a Python package! Install it via pip:

pip install vggsounder

Or install from source using uv:

git clone https://github.com/bizilizi/vggsounder.git
cd vggsounder
uv build
pip install dist/vggsounder-*.whl

🐍 Python Package Usage

Quick Start

import vggsounder

# Load the dataset
labels = vggsounder.VGGSounder()

# Access video data by ID
video_data = labels["--U7joUcTCo_000000"]
print(video_data.labels)        # List of labels for this video
print(video_data.meta_labels)   # Metadata (background_music, static_image, voice_over)
print(video_data.modalities)    # Modality for each label (A, V, AV)

# Get dataset statistics
stats = labels.stats()
print(f"Total videos: {stats['total_videos']}")
print(f"Unique labels: {stats['unique_labels']}")

# Search functionality
piano_videos = labels.get_videos_with_labels("playing piano")
voice_over_videos = labels.get_videos_with_meta(voice_over=True)

Advanced Usage

# Dict-like interface
print(len(labels))                    # Number of videos
print("video_id" in labels)           # Check if video exists
for video_id in labels:               # Iterate over video IDs
    video_data = labels[video_id]

# Get all unique labels
all_labels = labels.get_all_labels()

# Complex queries
static_speech_videos = labels.get_videos_with_meta(
    static_image=True, voice_over=True
)

🏷️ Label Format

VGGSounder annotations are stored in a CSV file located at data/vggsounder.csv. Each row corresponds to a single label for a specific video sample. The dataset supports multi-label, multi-modal classification with additional meta-information for robust evaluation.

Columns

  • video_id: Unique identifier for a 10-second video clip.
  • label: Human-readable label representing a sound or visual category (e.g. male singing, playing timpani).
  • modality: The modality in which the label is perceivable:
    • A = Audible
    • V = Visible
    • AV = Both audible and visible
  • background_music: True if the video contains background music.
  • static_image: True if the video consists of a static image.
  • voice_over: True if the video contains voice-over narration.

Example

video_id label modality background_music static_image voice_over
---g-f_I2yQ_000001 male singing A True False False
---g-f_I2yQ_000001 people crowd AV True False False
---g-f_I2yQ_000001 playing timpani A True False False

📦 Publishing to PyPI

To publish this package to PyPI:

  1. Prepare your environment:

    # Install uv if you haven't already
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  2. Build the package:

    uv build
    
  3. Set up PyPI credentials:

    • Create a PyPI account at https://pypi.org
    • Generate an API token in your PyPI account settings
    • Set the token: export UV_PUBLISH_TOKEN=your_pypi_token
  4. Publish to PyPI:

    # Test on Test PyPI first (recommended)
    uv publish --index-url https://test.pypi.org/legacy/
    
    # Then publish to main PyPI
    uv publish
    

For more details, see the uv publishing guide.

📑 Citation

If you find VGGSounder useful for your research and applications, please consider citing us using this BibTeX:

@article{zverevwiedemer2025vggsounder,
  author    = {Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke},
  title     = {VGGSounder: Audio-Visual Evaluations for Foundation Models},
  year      = {2025},
}

❤️ Acknowledgement

The authors would like to thank Felix Förster, Sayak Mallick, and Prasanna Mayilvahananan for their help with data annotation, as well as Thomas Klein and Shyamgopal Karthik for their help in setting up MTurk. They also thank numerous MTurk workers for labelling. This work was in part supported by the BMBF (FKZ: 01IS24060, 01I524085B), the DFG (SFB 1233, TP A1, project number: 276693517), and the Open Philanthropy Foundation funded by the Good Ventures Foundation. The authors thank the IMPRS-IS for supporting TW.

👮 License

This project is released under the Apache 2.0 license as found in the LICENSE file. Please get in touch with us if you find any potential violations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vggsounder-0.1.1.tar.gz (316.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vggsounder-0.1.1-py3-none-any.whl (321.6 kB view details)

Uploaded Python 3

File details

Details for the file vggsounder-0.1.1.tar.gz.

File metadata

  • Download URL: vggsounder-0.1.1.tar.gz
  • Upload date:
  • Size: 316.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for vggsounder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 03afbb1ef02bd9114429cf482a1b22298cd23d07a3ca7ec065dcd51ca9cd603d
MD5 7327788d957439e6982bea8c17de5c85
BLAKE2b-256 2e730e817ed2cf6ae40a9a7925783227d13a2885166844945183efef591f46ca

See more details on using hashes here.

File details

Details for the file vggsounder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vggsounder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 321.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for vggsounder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f3a5507075b59d1c22cb692ddbe3f27362fa4e877b72d99eaaece5b64d0c3a3
MD5 bed96481a228c5b2f370519b80a8bdf7
BLAKE2b-256 085c9149e80c0efe4c5262cff269ebc44cc73498b65e2f337098126ac314d791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page