Skip to main content

Implementation of audio SED architecture for tensorflow and pytorch.

Project description

Sound Event Detection

This repository implement a class which allow to build classifier for audio signal following the SED (Sound Event Detection) architecture. The model create as well mel spectrogram and use CNN backbone. This structure has been used during the birdcall competition which allow me to reach the 3th position in 2021.

This is implemented on pytorch and tensorflow, however the tensorflow version has not been tested for training purpose (only inference). I would advice to use only the pytorch version.

Installation

pip install audio-sed

if you use tensorflow:

pip install tflibrosa

Examples

Pytorch

import torch
import timm
from torch import nn
import numpy as np
from audio_sed.sed_config import ConfigSED
from audio_sed.pytorch.sed_models import AudioClassifierSequenceSED, AudioSED, shape_from_backbone

def load_model(model_name, num_classe, cfg_sed):
    backbone = timm.create_model( model_name, pretrained=False)
    if "efficientnet" in   model_name:
        backbone.global_pool =  nn.Identity()
        in_feat = backbone.classifier.in_features
        backbone.classifier = nn.Identity()
    elif "convnext" in cfg.model_name:
        in_feat = backbone.head.fc.in_features
        backbone.head = nn.Identity()

    in_features = shape_from_backbone(inputs=torch.as_tensor(np.random.uniform(0, 1, (1, int(5 * cfg_sed.sample_rate)))).float(), model=backbone, 
                                      use_logmel=True, config_sed = cfg_sed.__dict__)[2] # (batch size, channels, num_steps, y_axis) 
    print("Num timestamps features:",in_features)
    model = AudioSED(backbone, num_classes=[num_classe], in_features=in_feat, hidden_size=1024, activation= 'sigmoid', use_logmel=True, 
                spectrogram_augmentation = None, apply_attention="step", drop_rate = [0.5, 0.5], config_sed= cfg_sed.__dict__)

    model2 = AudioClassifierSequenceSED(model)
    
    return model, model2

cfg_sed =  ConfigSED(window='hann', center=True, pad_mode='reflect', windows_size=1024, hop_size=320,
                sample_rate=32000, mel_bins=128, fmin=50, fmax=16000, ref=1.0, amin=1e-10, top_db=None)

model_5, model = load_model(model_name="tf_efficientnet_b2_ns", num_classe=575, cfg_sed=cfg_sed)
inputs = torch.as_tensor(np.random.uniform(0,1, (20*32000)).reshape(1,4,-1)).float()
with torch.no_grad():
    o = model(inputs)
print(o[0]['clipwise'])

Tensorflow

import tensorflow as tf
from audio_sed.sed_config import ConfigSED
from audio_sed.tensorflow.sed_models import AudioClassifierSequenceSED as AudioClassifierSequenceSEDTF, AudioSED as AudioSEDTF, shape_from_backbone as shape_from_backboneTF



def load_modeltf(model_name, num_classe, cfg_sed):
    backbone =tf.keras.applications.efficientnet.EfficientNetB2(
    include_top=False)   
    if "efficientnet" in   model_name:
        in_feat = backbone.layers[-1].output.shape[-1] 
    elif "convnext" in cfg.model_name:
        in_feat = backbone.layers[-1].output.shape[-1]
    # batch size, num_steps, y_axis, channels
    in_features = shape_from_backboneTF(inputs=np.random.uniform(0, 1, (1, int(5 * cfg_sed.sample_rate))), model=backbone, use_logmel=True, config_sed = cfg_sed.__dict__)[1]
    print("Num timestamps features:",in_features)
    model = AudioSEDTF(backbone, num_classes=[num_classe], in_features=in_feat, hidden_size=1024, activation= 'sigmoid', use_logmel=True, 
                spectrogram_augmentation = None, apply_attention="step", drop_rate = [0.5, 0.5], config_sed= cfg_sed.__dict__)

    model = AudioClassifierSequenceSEDTF(model)
    
    return model

cfg_sed =  ConfigSED(window='hann', center=True, pad_mode='reflect', windows_size=1024, hop_size=320,
                sample_rate=32000, mel_bins=128, fmin=50, fmax=16000, ref=1.0, amin=1e-10, top_db=None)

inputs = np.random.uniform(0,1, (1, 4, 5*32000))
o_tf = model_tf.predict(inputs)
print(o_tf[0])

Examples 2:

You can find here an example of how this model is used with a GUI for inference with some checkpoint available.

Citation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_sed-0.0.1.1.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

audio_sed-0.0.1.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file audio_sed-0.0.1.1.tar.gz.

File metadata

  • Download URL: audio_sed-0.0.1.1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for audio_sed-0.0.1.1.tar.gz
Algorithm Hash digest
SHA256 59302a04cf10388abefa4f207e71f4674efc0489a4a0b50d218f7afc2a3bc449
MD5 b8cbbcd98f4a81e3e95ed08041d6c2d5
BLAKE2b-256 e989d1b65527727e6a75aa165e2bd71a637a6a946c3947afdfaec725eaaddd6f

See more details on using hashes here.

File details

Details for the file audio_sed-0.0.1.1-py3-none-any.whl.

File metadata

  • Download URL: audio_sed-0.0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for audio_sed-0.0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ac764d4b5e3a326e118a6889f68e0c4e7f685256f59363fac55f612fee7a968e
MD5 9351588b36360aa0817b49f36628de44
BLAKE2b-256 d1f9320605d11e7e7ce25c247d07e20a7f0a6f9abad82f4864e4e27e9a899e7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page