torch_vggish_yamnet: PyTorch VGGish & YAMNet models
Project description
Torch VGGish & YAMNet embedding models
torch_vggish_yamnet provides a ready-to-use PyTorch porting of AudioSet (Google) audio embedding models. The audio tagging models are trained from Models for AudioSet: A Large Scale Dataset of Audio Events: https://github.com/tensorflow/models/tree/master/research/audioset
This is a re-structured forked repository/project from torch_audioset
(see References)
Installation
PyTorch>=1.0 is required (dependecies are auto-installed).
pip install torch-vggish-yamnet
Usage
from torch_vggish_yamnet import yamnet
from torch_vggish_yamnet import vggish
from torch_vggish_yamnet.input_proc import *
# Input signal (x_in) tensor conversion & ad-hoc patching
converter = WaveformToInput()
in_tensor = converter(x_in.float(), in_sr)
in_tensor.shape
# Models init
embedding_yamnet = yamnet.yamnet(pretrained=True)
embedding_vggish = vggish.get_vggish(with_classifier=False, pretrained=True)
# Embedding (forward)
emb_yamnet, _ = embedding_yamnet(in_tensor) # discard logits
emb_vggish = embedding_vggish(in_tensor)
emb_yamnet.shape, emb_vggish.shape
References
[1] AudioSet Official site: http://g.co/audioset
[2]
@inproceedings{45857,
title = {Audio Set: An ontology and human-labeled dataset for audio events},
author = {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
year = {2017},
booktitle = {Proc. IEEE ICASSP 2017},
address = {New Orleans, LA}}
[3]
@incollection{45611,
title = {CNN Architectures for Large-Scale Audio Classification},
author = {Shawn Hershey and Sourish Chaudhuri and Daniel P. W. Ellis and Jort F. Gemmeke and Aren Jansen and Channing Moore and Manoj Plakal and Devin Platt and Rif A. Saurous and Bryan Seybold and Malcolm Slaney and Ron Weiss and Kevin Wilson},
year = {2017},
URL = {https://arxiv.org/abs/1609.09430},
booktitle = {International Conference on Acoustics, Speech and Signal Processing (ICASSP)}}
[4] torch_audioset GitHub repository: https://github.com/w-hc/torch_audioset/tree/master
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torch_vggish_yamnet-0.2.1.tar.gz
.
File metadata
- Download URL: torch_vggish_yamnet-0.2.1.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9794a5c3374512e66bd143f98d925c4546152c066ed6462431c7c9b40f42afb9 |
|
MD5 | 5b5b2f22199f9df1bbd249fae5999238 |
|
BLAKE2b-256 | bfa5ee86aeb801fed1e76c3787badaddd25a3d8cdc5b0c9a132e9ea7cda4f972 |
File details
Details for the file torch_vggish_yamnet-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: torch_vggish_yamnet-0.2.1-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04ce86c077dfb1e6ccfaec849895088cf13af84a355e05ce6d1f495451af3b5c |
|
MD5 | 00f6cb6692c17f6b832a54d74d54ab5c |
|
BLAKE2b-256 | 7c8ca3e0c1c3fbc7ca87839329bf1f3affe72dea542e47fc6413ee00e30a353e |