Skip to main content

torch_vggish_yamnet: PyTorch VGGish & YAMNet models

Project description

Torch VGGish & YAMNet embedding models

torch_vggish_yamnet provides a ready-to-use PyTorch porting of AudioSet (Google) audio embedding models. The audio tagging models are trained from Models for AudioSet: A Large Scale Dataset of Audio Events: https://github.com/tensorflow/models/tree/master/research/audioset

This is a re-structured forked repository/project from torch_audioset (see References)

Installation

PyTorch>=1.0 is required (dependecies are auto-installed).

pip install torch-vggish-yamnet

Usage

from torch_vggish_yamnet import yamnet
from torch_vggish_yamnet import vggish
from torch_vggish_yamnet.input_proc import *

# Input signal (x_in) tensor conversion & ad-hoc patching
converter = WaveformToInput()
in_tensor = converter(x_in.float(), in_sr)
in_tensor.shape

# Models init
embedding_yamnet = yamnet.yamnet(pretrained=True)
embedding_vggish = vggish.get_vggish(with_classifier=False, pretrained=True)

# Embedding (forward)
emb_yamnet, _ = embedding_yamnet(in_tensor)  # discard logits
emb_vggish = embedding_vggish(in_tensor)

emb_yamnet.shape, emb_vggish.shape

References

[1] AudioSet Official site: http://g.co/audioset

[2]

@inproceedings{45857,
 title	    = {Audio Set: An ontology and human-labeled dataset for audio events},
 author	    = {Jort F. Gemmeke and Daniel P. W. Ellis and Dylan Freedman and Aren Jansen and Wade Lawrence and R. Channing Moore and Manoj Plakal and Marvin Ritter},
 year	      = {2017},
 booktitle	= {Proc. IEEE ICASSP 2017},
 address	  = {New Orleans, LA}}

[3]

@incollection{45611,
title	      = {CNN Architectures for Large-Scale Audio Classification},
author	    = {Shawn Hershey and Sourish Chaudhuri and Daniel P. W. Ellis and Jort F. Gemmeke and Aren Jansen and Channing Moore and Manoj Plakal and Devin Platt and Rif A. Saurous and Bryan Seybold and Malcolm Slaney and Ron Weiss and Kevin Wilson},
year	      = {2017},
URL	        = {https://arxiv.org/abs/1609.09430},
booktitle	  = {International Conference on Acoustics, Speech and Signal Processing (ICASSP)}}

[4] torch_audioset GitHub repository: https://github.com/w-hc/torch_audioset/tree/master

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torch_vggish_yamnet-0.2.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

torch_vggish_yamnet-0.2.1-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file torch_vggish_yamnet-0.2.1.tar.gz.

File metadata

  • Download URL: torch_vggish_yamnet-0.2.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.9

File hashes

Hashes for torch_vggish_yamnet-0.2.1.tar.gz
Algorithm Hash digest
SHA256 9794a5c3374512e66bd143f98d925c4546152c066ed6462431c7c9b40f42afb9
MD5 5b5b2f22199f9df1bbd249fae5999238
BLAKE2b-256 bfa5ee86aeb801fed1e76c3787badaddd25a3d8cdc5b0c9a132e9ea7cda4f972

See more details on using hashes here.

File details

Details for the file torch_vggish_yamnet-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for torch_vggish_yamnet-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 04ce86c077dfb1e6ccfaec849895088cf13af84a355e05ce6d1f495451af3b5c
MD5 00f6cb6692c17f6b832a54d74d54ab5c
BLAKE2b-256 7c8ca3e0c1c3fbc7ca87839329bf1f3affe72dea542e47fc6413ee00e30a353e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page