VGGish in Keras.
VGGish: A VGG-like audio classification model
This repository provides a VGGish model, implemented in Keras with tensorflow backend (since
tf.slim is deprecated, I think we should have an up-to-date interface). This repository is developed
based on the model for AudioSet.
For more details, please visit the slim version.
pip install vggish-keras
Weights will be downloaded the first time they are requested. You can also run
python -m vggish_keras.download_helpers.download_weights which will download them.
Basic - simple & efficient method:
import librosa import numpy as np import vggish_keras as vgk # loads the model once and provides a simple function that takes in `filename` or `y, sr` compute = vgk.get_embedding_function(hop_duration=0.25) # model, pump, and sampler are available as attributes compute.model.summary() # take a peak at the model # compute from filename Z, ts = compute(librosa.util.example_audio_file()) # compute from pcm y, sr = librosa.load(librosa.util.example_audio_file()) Z, ts = compute(y=y, sr=sr)
Alternatives - using the under-the-hood helper functions:
# get the embeddings - WARNING: it instantiates a new model each time Z, ts = vgk.get_embeddings(librosa.util.example_audio_file(), hop_duration=0.25) # create model, pump, sampler once and pass to vgk.get_embeddings # - more typing :'( model, pump, sampler = vgk.get_embedding_model(hop_duration=0.25) Z, ts = vgk.get_embeddings( librosa.util.example_audio_file(), model=model, pump=pump, sampler=sampler)
Manually, using the keras model and pump directly:
import librosa import numpy as np import vggish_keras as vgk # define the model pump = vgk.get_pump() model = vgk.VGGish(pump) sampler = vgk.get_sampler(pump) # transform audio into VGGish embeddings filename = librosa.util.example_audio_file() X = np.concatenate([ x[vgk.params.PUMP_INPUT] for x in sampler(pump(filename))]) Z = model.predict(X) # calculate timestamps ts = vgk.get_timesteps(Z, pump, sampler) assert Z.shape == (13, 512)
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
I include a weight conversion script in download_helpers/convert_ckpt.py which shows how I converted the weights from
.h5 for those that are interested.
- currently, parameters (sample rate, hop size, etc) can be changed globally via
vgk.params- I'd like to allow for parameter overrides to be passed to
- currently it relies on https://github.com/bmcfee/pumpp/pull/123. Once merged, remove custom github install location
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.