Tensorflow Layer that implements the SpecAugment technique
Project description
Theory
This library contains an implementation of SpecAugment, a simple data augmentation technique for speech recognition. Following the kapre philosophy, the implementation of this method has been encapsulated in a custom layer of Tensorflow, so that it can be incorporated into our neural network architectures directly.
Install Package
To install the package execute the following command.
pip install spec-augment
Usage
SpecAugment(freq_mask_param=5, time_mask_param=10)
SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3)
SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3,
mask_value=0)
Arguments
- freq_mask_param - Frequency Mask Parameter (F in the paper)
- time_mask_param - Time Mask Parameter (T in the paper)
- n_freq_mask - Number of frequency masks to apply (mF in the paper). By default is 1.
- n_time_mask - Number of time masks to apply (mT in the paper). By default is 1.
- mask_value - Imputation value. By default is zero.
Example
SpecAugment is a technique applicable to spectrograms. In the following example, kapre is used to compute the Mel Spectrogram of a sample audio from Librosa.
import tensorflow as tf
from tensorflow.keras.models import Sequential
import librosa
import kapre
from spec_augment import SpecAugment
filename = librosa.ex('trumpet')
y, sr = librosa.load(filename)
audio_tensor = tf.reshape(tf.cast(y, tf.float32), (1, -1, 1))
input_shape = y.reshape(-1, 1).shape
melgram = kapre.composed.get_melspectrogram_layer(input_shape=input_shape,
n_fft=1024,
return_decibel=True,
n_mels=256,
input_data_format='channels_last',
output_data_format='channels_last')
Now we instantiate the SpecAugment layer. We are using an F of 5 (freq_mask_param), a T of 10 (time_mask_param), 5 frequency masks (n_freq_mask) and 3 time masks (n_time_mask). We will use a mask_value of -100.
# Now we define the SpecAugment layer
spec_augment = SpecAugment(freq_mask_param=5,
time_mask_param=10,
n_freq_mask=5,
n_time_mask=3,
mask_value=-100)
model = Sequential()
model.add(melgram)
model.add(spec_augment)
model.summary()
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spec_augment-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d8196e3ece0ea7f3a8ce6545b715e0b5b5c17426e7db58bf5fdc18bc7ba240d |
|
MD5 | 7067e4d3f31689f0791e06458ce0a8d2 |
|
BLAKE2b-256 | 867e4c03ebfc5fa25959a9292adf268cc58395d0cd7f96db48ccbc8a0cd30549 |