panns_inference: audio tagging and sound event detection inference toolbox
Project description
PANNs inferece
panns_inference provides an easy to use Python interface for audio tagging and sound event detection. The audio tagging and sound event detection models are trained from PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition: https://github.com/qiuqiangkong/audioset_tagging_cnn
Installation
PyTorch>=1.0 is required.
$ pip install panns-inference
Usage
$ python3 example.py
For example:
import librosa
import panns_inference
from panns_inference import AudioTagging, SoundEventDetection, labels
audio_path = 'examples/R9_ZSCveAHg_7s.wav'
(audio, _) = librosa.core.load(audio_path, sr=32000, mono=True)
audio = audio[None, :] # (batch_size, segment_samples)
print('------ Audio tagging ------')
at = AudioTagging(device='cuda')
(clipwise_output, embedding) = at.inference(audio)
print('------ Sound event detection ------')
sed = SoundEventDetection(device='cuda')
framewise_output = sed.inference(audio)
Results
------ Audio tagging ------ Checkpoint path: /root/panns_data/Cnn14_mAP=0.431.pth GPU number: 1 Speech: 0.893 Telephone bell ringing: 0.754 Inside, small room: 0.235 Telephone: 0.183 Music: 0.092 Ringtone: 0.047 Inside, large room or hall: 0.028 Alarm: 0.014 Animal: 0.009 Vehicle: 0.008 ------ Sound event detection ------ Checkpoint path: /root/panns_data/Cnn14_mAP=0.431.pth GPU number: 1 Save fig to appendixes/sed_result.pdf
Sound event detection plot:
Cite
[1] Kong, Qiuqiang, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, and Mark D. Plumbley. "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition." arXiv preprint arXiv:1912.10211 (2019).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for panns_inference-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f68a5748956550eec3b627530cf3d756feb95378087346688f063370aca3e1b |
|
MD5 | 6a1d7c157031de7c9518801eeed5c6c6 |
|
BLAKE2b-256 | a4d9f087994c79acd4d7e1b9dd1e7df0f5856bd88c5d41f581baa463e3b02187 |