Skip to main content

Holistic Evaluation of Audio Representations (HEAR) 2021 -- Baseline Model

Project description

HEAR2021

HEAR 2021 Baseline

A simple DSP-based audio embedding consisting of a Mel-frequency spectrogram followed by a random projection. Serves as the naive baseline model for the HEAR 2021 and implements the common API required by the competition evaluation.

For full details on the HEAR 2021 NeurIPS competition and for information on how to participate, please visit the competition website.

Installation

Method 1: pypi

pip install hearbaseline

Method 2: pip local source tree

This is the same method that will be used to by competition organizers when installing submissions to HEAR 2021.

git clone https://github.com/neuralaudio/hear-baseline.git
python3 -m pip install -e ./hear-baseline

Naive Baseline Model

The naive baseline model produces log-scaled Mel-frequency spectrograms using a 256-band Mel filter. Each frame of the spectrogram is then projected to 4096 dimensions using a random projection matrix. Weights for the projection matrix were generated by sampling a normal distribution and are stored in this repository in the file saved_models/naive_baseline.pt.

Using a random projection is less efficient than a CNN but is one of the simplest models to implement from a coding perspective.

Usage

Audio embeddings can be computed using one of two methods: 1) get_scene_embeddings, or 2) get_timestamp_embeddings.

get_scene_embeddings accepts a batch of audio clips and produces a single embedding for each audio clip. This can be computed like so:

import torch
import hearbaseline

# Load model with weights - located in the root directory of this repo
model = hearbaseline.load_model("saved_models/naive_baseline.pt")

# Create a batch of 2 white noise clips that are 2-seconds long
# and compute scene embeddings for each clip
audio = torch.rand((2, model.sample_rate * 2))
embeddings = hearbaseline.get_scene_embeddings(audio, model)

The get_timestamp_embeddings method works exactly the same but returns an array of embeddings computed every 25ms over the duration of the input audio. An array of timestamps corresponding to each embedding is also returned.

See the common API for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hearbaseline-2021.0.3.tar.gz (12.7 kB view hashes)

Uploaded Source

Built Distribution

hearbaseline-2021.0.3-py3-none-any.whl (17.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page