VGGish in Keras.
Project description
VGGish: A VGG-like audio classification model
This repository provides a VGGish model, implemented in Keras with tensorflow backend (since tf.slim
is deprecated, I think we should have an up-to-date interface). This repository is developed
based on the model for AudioSet.
For more details, please visit the slim version.
Install
pip install vggish-keras
Weights will be downloaded the first time they are requested. You can also run python -m vggish_keras.download_helpers.download_weights
which will download them.
Usage
Basic - simple & efficient method:
import librosa
import numpy as np
import vggish_keras as vgk
# loads the model once and provides a simple function that takes in `filename` or `y, sr`
compute = vgk.get_embedding_function(hop_duration=0.25)
# model, pump, and sampler are available as attributes
compute.model.summary() # take a peak at the model
# compute from filename
Z, ts = compute(librosa.util.example_audio_file())
# compute from pcm
y, sr = librosa.load(librosa.util.example_audio_file())
Z, ts = compute(y=y, sr=sr)
Alternatives - using the under-the-hood helper functions:
# get the embeddings - WARNING: it instantiates a new model each time
Z, ts = vgk.get_embeddings(librosa.util.example_audio_file(), hop_duration=0.25)
# create model, pump, sampler once and pass to vgk.get_embeddings
# - more typing :'(
model, pump, sampler = vgk.get_embedding_model(hop_duration=0.25)
Z, ts = vgk.get_embeddings(
librosa.util.example_audio_file(),
model=model, pump=pump, sampler=sampler)
Manually, using the keras model and pump directly:
import librosa
import numpy as np
import vggish_keras as vgk
# define the model
pump = vgk.get_pump()
model = vgk.VGGish(pump)
sampler = vgk.get_sampler(pump)
# transform audio into VGGish embeddings
filename = librosa.util.example_audio_file()
X = np.concatenate([
x[vgk.params.PUMP_INPUT]
for x in sampler(pump(filename))])
Z = model.predict(X)
# calculate timestamps
ts = vgk.get_timesteps(Z, pump, sampler)
assert Z.shape == (13, 512)
Reference:
-
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
-
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
I include a weight conversion script in download_helpers/convert_ckpt.py which shows how I converted the weights from .ckpt
to .h5
for those that are interested.
TODO
- currently, parameters (sample rate, hop size, etc) can be changed globally via
vgk.params
- I'd like to allow for parameter overrides to be passed tovgk.VGGish
- currently it relies on https://github.com/bmcfee/pumpp/pull/123. Once merged, remove custom github install location
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file vggish-keras-0.1.1.tar.gz
.
File metadata
- Download URL: vggish-keras-0.1.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f177db408699623335187d3cbcdd1aa916413bcf3b1a48db6e60640ae161c273 |
|
MD5 | 285d02e3066f099a1991134f2f6bbb3d |
|
BLAKE2b-256 | 88e2706800cb95ae36567cecfdc4e95f74dfa97f141b688601ec31de65724cbc |