VGGish in Keras.
VGGish: A VGG-like audio classification model
This repository provides a VGGish model, implemented in Keras with tensorflow backend (since
tf.slim is deprecated, I think we should have an up-to-date interface). This repository is developed
based on the model for AudioSet.
For more details, please visit the slim version.
pip install vggish-keras
Weights will be automatically downloaded when installing via pip.
Currently - this relies on a pending change to
pumpp in https://github.com/bmcfee/pumpp/pull/123. To get those changes, you need
pip install git+https://github.com/beasteers/pumpp@tf_keras
import librosa import numpy as np import vggish_keras as vgk # define the model pump = vgk.get_pump() model = vgk.VGGish(pump) # transform audio into VGGish embeddings without fc layers X = pump.transform(librosa.util.example_audio_file())[vgk.params.PUMP_INPUT] X = np.concatenate([X]*5) Z = model.predict(X) # calculate timestamps op = pump['mel'] ts = np.arange(len(Z)) / op.sr * op.hop_length assert Z.shape == (5, 512)
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
Model with the top fully connected layers
Model without the top fully connected layers
- add fully connected layers
- add PCA postprocessing (needs fully connected layers and to add PCA params to model)
- currently, parameters (sample rate, hop size, etc) can be changed globally via
vgk.params- I'd like to allow for parameter overrides to be passed to
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size vggish-keras-0.0.18.tar.gz (8.2 kB)||File type Source||Python version None||Upload date||Hashes View|