VGGish in Keras.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: ISC License (ISCL)
Programming Language
Topic
- Software Development

Project description

VGGish: A VGG-like audio classification model

This repository provides a VGGish model, implemented in Keras with tensorflow backend (since tf.slim is deprecated, I think we should have an up-to-date interface). This repository is developed based on the model for AudioSet. For more details, please visit the slim version.

Install

pip install vggish-keras

Weights will be downloaded the first time they are requested. You can also run python -m vggish_keras.download_helpers.download_weights which will download them.

Usage

Basic - simple & efficient method:

import librosa
import numpy as np
import vggish_keras as vgk

# loads the model once and provides a simple function that takes in `filename` or `y, sr`
compute = vgk.get_embedding_function(hop_duration=0.25)
# model, pump, and sampler are available as attributes
compute.model.summary() # take a peak at the model

# compute from filename
Z, ts = compute(librosa.util.example_audio_file())

# compute from pcm
y, sr = librosa.load(librosa.util.example_audio_file())
Z, ts = compute(y=y, sr=sr)

Alternatives - using the under-the-hood helper functions:

# get the embeddings - WARNING: it instantiates a new model each time
Z, ts = vgk.get_embeddings(librosa.util.example_audio_file(), hop_duration=0.25)

# create model, pump, sampler once and pass to vgk.get_embeddings
# - more typing :'(
model, pump, sampler = vgk.get_embedding_model(hop_duration=0.25)
Z, ts = vgk.get_embeddings(
    librosa.util.example_audio_file(),
    model=model, pump=pump, sampler=sampler)

Manually, using the keras model and pump directly:

import librosa
import numpy as np
import vggish_keras as vgk

# define the model
pump = vgk.get_pump()
model = vgk.VGGish(pump)
sampler = vgk.get_sampler(pump)

# transform audio into VGGish embeddings
filename = librosa.util.example_audio_file()
X = np.concatenate([
    x[vgk.params.PUMP_INPUT]
    for x in sampler(pump(filename))])
Z = model.predict(X)

# calculate timestamps
ts = vgk.get_timesteps(Z, pump, sampler)
assert Z.shape == (13, 512)

Reference:

Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017

I include a weight conversion script in download_helpers/convert_ckpt.py which shows how I converted the weights from .ckpt to .h5 for those that are interested.

TODO

currently, parameters (sample rate, hop size, etc) can be changed globally via vgk.params - I'd like to allow for parameter overrides to be passed to vgk.VGGish
currently it relies on https://github.com/bmcfee/pumpp/pull/123. Once merged, remove custom github install location

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: ISC License (ISCL)
Programming Language
Topic
- Software Development

Release history Release notifications | RSS feed

This version

0.1.1

May 20, 2020

0.1.0

May 20, 2020

0.0.19

May 20, 2020

0.0.18

Oct 2, 2019

0.0.17

Sep 26, 2019

0.0.16

Sep 26, 2019

0.0.15

Sep 26, 2019

0.0.14

Sep 26, 2019

0.0.13

Sep 26, 2019

0.0.12

Sep 26, 2019

0.0.11

Sep 26, 2019

0.0.10

Sep 26, 2019

0.0.9

Sep 26, 2019

0.0.8

Sep 26, 2019

0.0.7

Sep 26, 2019

0.0.6

Sep 26, 2019

0.0.5

Sep 26, 2019

0.0.4

Sep 26, 2019

0.0.3

Sep 26, 2019

0.0.2

Sep 26, 2019

0.0.1

Sep 26, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vggish-keras-0.1.1.tar.gz (8.1 kB view hashes)

Uploaded May 20, 2020 Source

Hashes for vggish-keras-0.1.1.tar.gz

Hashes for vggish-keras-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f177db408699623335187d3cbcdd1aa916413bcf3b1a48db6e60640ae161c273`
MD5	`285d02e3066f099a1991134f2f6bbb3d`
BLAKE2b-256	`88e2706800cb95ae36567cecfdc4e95f74dfa97f141b688601ec31de65724cbc`