Skip to main content

SincNet - Tensorflow

Project description

SincNet in Tensorflow

An Implementation of SincNet using Tenorflow 2.x.

  • Models are converted from original torch networks.
  • The main implementation of the sinc_conv layer is non-optimal. Instead of using loops in the call section, we used matrix multiplication and a few programming tricks that allow the hardware to run more efficiently (25 times faster).

SincNet

SincNet is a neural architecture for processing raw audio samples. It is a novel Convolutional Neural Network (CNN) that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. Arxiv

Install

$ pip install sincnet-tensorflow

Usage

Demo

Training on a dummy database to check for error-free execution

Open In Colab

A layer for Keras Functional

import tensorflow as tf
from tensorflow.keras.layers import Dense, Conv1D
from tensorflow.keras.layers import LeakyReLU, BatchNormalization, Flatten, MaxPooling1D, Input

from sincnet_tensorflow import SincConv1D, LayerNorm


out_dim = 50 #number of classes

sinc_layer = SincConv1D(N_filt=64,
                        Filt_dim=129,
                        fs=16000,
                        stride=16,
                        padding="SAME")


inputs = Input((32000, 1)) 

x = sinc_layer(inputs)
x = LayerNorm()(x)

x = LeakyReLU(alpha=0.2)(x)
x = MaxPooling1D(pool_size=2)(x)


x = Conv1D(64, 3, strides=1, padding='valid')(x)
x = BatchNormalization(momentum=0.05)(x)
x = LeakyReLU(alpha=0.2)(x)
x = MaxPooling1D(pool_size=2)(x)

x = Conv1D(64, 3, strides=1, padding='valid')(x)
x = BatchNormalization(momentum=0.05)(x)
x = LeakyReLU(alpha=0.2)(x)
x = MaxPooling1D(pool_size=2)(x)

x = Conv1D(128, 3, strides=1, padding='valid')(x)
x = BatchNormalization(momentum=0.05)(x)
x = LeakyReLU(alpha=0.2)(x)
x = MaxPooling1D(pool_size=2)(x)

x = Conv1D(128, 3, strides=1, padding='valid')(x)
x = BatchNormalization(momentum=0.05)(x)
x = LeakyReLU(alpha=0.2)(x)
x = MaxPooling1D(pool_size=2)(x)

x = Flatten()(x)

x = Dense(256)(x)
x = BatchNormalization(momentum=0.05, epsilon=1e-5)(x)
x = LeakyReLU(alpha=0.2)(x)

x = Dense(256)(x)
x = BatchNormalization(momentum=0.05, epsilon=1e-5)(x)
x = LeakyReLU(alpha=0.2)(x)

prediction = Dense(out_dim, activation='softmax')(x)
model = tf.keras.models.Model(inputs=inputs, outputs=prediction)

model.summary()

References

@inproceedings{ravanelli2018speaker,
  title={Speaker recognition from raw waveform with sincnet},
  author={Ravanelli, Mirco and Bengio, Yoshua},
  booktitle={2018 IEEE Spoken Language Technology Workshop (SLT)},
  pages={1021--1028},
  year={2018},
  organization={IEEE}
}

@misc{SincNet,
    title   = {SincNet}, 
    author  = {Mirco Ravanelli (mravanelli)},
    year    = {2018},
    url  = {https://github.com/mravanelli/SincNet},
    publisher = {Github},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sincnet-tensorflow-0.0.2.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file sincnet-tensorflow-0.0.2.tar.gz.

File metadata

  • Download URL: sincnet-tensorflow-0.0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.5

File hashes

Hashes for sincnet-tensorflow-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f6cf66f95bceba1ecdd75b60b70be53e7dbdee785a192a394aa77ed90cd368bf
MD5 33e4a02054a90f194995f92822c97ddb
BLAKE2b-256 1cbed5b0de1b1e8b99ca1f34b435a80c579d79cca85b39e916a8639a8a4b55a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page