Skip to main content

Tensorflow Layer that implements the SpecAugment technique

Project description

Theory

This library contains an implementation of SpecAugment, a simple data augmentation technique for speech recognition. Following the kapre philosophy, the implementation of this method has been encapsulated in a custom layer of Tensorflow, so that it can be incorporated into our neural network architectures directly.

Install Package

To install the package execute the following command.

pip install spec-augment

Usage

SpecAugment(freq_mask_param=5, time_mask_param=10)

SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3)

SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3,
            mask_value=0)

Arguments

  • freq_mask_param - Frequency Mask Parameter (F in the paper)
  • time_mask_param - Time Mask Parameter (T in the paper)
  • n_freq_mask - Number of frequency masks to apply (mF in the paper). By default is 1.
  • n_time_mask - Number of time masks to apply (mT in the paper). By default is 1.
  • mask_value - Imputation value. By default is zero.

Example

SpecAugment is a technique applicable to spectrograms. In the following example, kapre is used to compute the Mel Spectrogram of a sample audio from Librosa.

import tensorflow as tf
from tensorflow.keras.models import Sequential
import librosa
import kapre
from spec_augment import SpecAugment

filename = librosa.ex('trumpet')
y, sr = librosa.load(filename)

audio_tensor = tf.reshape(tf.cast(y, tf.float32), (1, -1, 1))
input_shape = y.reshape(-1, 1).shape

melgram = kapre.composed.get_melspectrogram_layer(input_shape=input_shape,
                                                  n_fft=1024,
                                                  return_decibel=True,
                                                  n_mels=256,
                                                  input_data_format='channels_last',
                                                  output_data_format='channels_last')

Now we instantiate the SpecAugment layer. We are using an F of 5 (freq_mask_param), a T of 10 (time_mask_param), 5 frequency masks (n_freq_mask) and 3 time masks (n_time_mask). We will use a mask_value of -100.


# Now we define the SpecAugment layer
spec_augment = SpecAugment(freq_mask_param=5,
                           time_mask_param=10,
                           n_freq_mask=5,
                           n_time_mask=3,
                           mask_value=-100)                 
model = Sequential()
model.add(melgram)
model.add(spec_augment)

model.summary()

References

https://arxiv.org/abs/1904.08779

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spec_augment-0.0.3.tar.gz (4.1 kB view hashes)

Uploaded Source

Built Distribution

spec_augment-0.0.3-py3-none-any.whl (4.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page