Skip to main content

Holistic Evaluation of Audio Representations (HEAR) 2021 -- Baseline Model

Project description

HEAR Baseline

Several baseline audio embeddings that implement the common API required by the HEAR Benchmark and 2021 HEAR NeurIPS competition.

Includes a simple DSP-based audio embedding consisting of a Mel-frequency spectrogram followed by a random projection, implemented in PyTorch, TensorFlow, and Keras.

Additionally, we wrap several benchmark audio embedding models. However, many of them are ineffecient because of limiting assumptions in the original implementation (e.g. only one audio file can be processed at a time using their high-level API).

For the HEAR Benchmark and 2021 NeurIPS evaluation, hearbaseline.wav2vec2 and hearbaseline.torchcrepe baseline embeddings were used.

For full details on the HEAR Benchmark please visit https://hearbenchmark.com

Installation

Tested with Python 3.7 and 3.8. Python 3.9 is not officially supported because pip3 installs are very finicky, but it might work.

Method 1: pypi

pip install hearbaseline

Method 2: pip local source tree

This is the same method that will be used to by competition organizers when installing submissions to HEAR 2021.

git clone https://github.com/hearbenchmark/hear-baseline.git
python3 -m pip install -e ./hear-baseline

Naive Baseline Model

The naive baseline model provides an example for implementing the HEAR common API using DSP-based techniques. It produces log-scaled Mel-frequency spectrograms using a 256-band Mel filter. Each frame of the spectrogram is then projected to 4096 dimensions using a random projection matrix. Weights for the projection matrix were generated by sampling a normal distribution and are stored in this repository in the file saved_models/naive_baseline.pt.

Using a random projection is less efficient than a CNN but is one of the simplest models to implement from a coding perspective.

Usage

Audio embeddings can be computed using one of two methods: 1) get_scene_embeddings, or 2) get_timestamp_embeddings.

get_scene_embeddings accepts a batch of audio clips and produces a single embedding for each audio clip. This can be computed like so:

import torch
import hearbaseline

# Load model with weights - located in the root directory of this repo
model = hearbaseline.load_model("saved_models/naive_baseline.pt")

# Create a batch of 2 white noise clips that are 2-seconds long
# and compute scene embeddings for each clip
audio = torch.rand((2, model.sample_rate * 2))
embeddings = hearbaseline.get_scene_embeddings(audio, model)

The get_timestamp_embeddings method works exactly the same but returns an array of embeddings computed every 25ms over the duration of the input audio. An array of timestamps corresponding to each embedding is also returned.

See the common API for more details.

Other Baselines

  • hearbaseline.torchcrepe
  • hearbaseline.vggish
  • hearbaseline.vqt
  • hearbaseline.wav2vec2
  • hearbaseline.keras.naive
  • hearbaseline.tf.naive

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hearbaseline-2021.1.1.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

hearbaseline-2021.1.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file hearbaseline-2021.1.1.tar.gz.

File metadata

  • Download URL: hearbaseline-2021.1.1.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for hearbaseline-2021.1.1.tar.gz
Algorithm Hash digest
SHA256 de41199d8ccf6ff2064734ff550fdcee72a8f831c0e2ca65c47956b978ffdeb8
MD5 5276e884f89c50ca8f2a19d44e883e19
BLAKE2b-256 2e05c74427f5916bb4076f667d5bb0fae42d99c6e34ddd91489dfeda29332bb4

See more details on using hashes here.

File details

Details for the file hearbaseline-2021.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for hearbaseline-2021.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ba0a50b6cd0c85f7ecccd1222dcb1fad8919b4e2ea04d1a9bfe6752586a6040
MD5 697424c2a3726c2db34b46d14537576e
BLAKE2b-256 4f1e9abca097153dde82b41205d9e3f38d99cabb7b09c0f5c4dbbf6418e73edb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page