VGGish in Keras.
Project description
VGGish: A VGG-like audio classification model
This repository provides a VGGish model, implemented in Keras with tensorflow backend (since tf.slim
is deprecated, I think we should have an up-to-date interface). This repository is developed
based on the model for AudioSet.
For more details, please visit the slim version.
Install
pip install vggish-keras
Weights will be automatically downloaded when installing via pip.
Currently - this relies on a pending change to pumpp
in https://github.com/bmcfee/pumpp/pull/123. To get those changes, you need
pip install git+https://github.com/beasteers/pumpp@tf_keras
Usage
import librosa
import numpy as np
import vggish_keras as vgk
# define the model
pump = vgk.get_pump()
model = vgk.VGGish(pump)
# transform audio into VGGish embeddings without fc layers
X = pump.transform(librosa.util.example_audio_file())[vgk.params.PUMP_INPUT]
X = np.concatenate([X]*5)
Z = model.predict(X)
# calculate timestamps
op = pump['mel']
ts = np.arange(len(Z)) / op.sr * op.hop_length
assert Z.shape == (5, 512)
Reference:
-
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
-
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
-
Model with the top fully connected layers
-
Model without the top fully connected layers
TODO
- add fully connected layers
- add PCA postprocessing (needs fully connected layers and to add PCA params to model)
- currently, parameters (sample rate, hop size, etc) can be changed globally via
vgk.params
- I'd like to allow for parameter overrides to be passed tovgk.VGGish
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.