Tensorflow Layer that implements the SpecAugment technique
Project description
Theory
This library contains an implementation of SpecAugment, a simple data augmentation technique for speech recognition. Following the kapre philosophy, the implementation of this method has been encapsulated in a custom layer of Tensorflow, so that it can be incorporated into our neural network architectures directly.
Install Package
To install the package execute the following command.
pip install spec-augment
Usage
SpecAugment(freq_mask_param=5, time_mask_param=10)
SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3)
SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3,
mask_value=0)
Arguments
- freq_mask_param - Frequency Mask Parameter (F in the paper)
- time_mask_param - Time Mask Parameter (T in the paper)
- n_freq_mask - Number of frequency masks to apply (mF in the paper). By default is 1.
- n_time_mask - Number of time masks to apply (mT in the paper). By default is 1.
- mask_value - Imputation value. By default is zero.
Example
SpecAugment is a technique applicable to spectrograms. In the following example, kapre is used to compute the Mel Spectrogram of a sample audio from Librosa.
import tensorflow as tf
from tensorflow.keras.models import Sequential
import librosa
import kapre
from spec_augment import SpecAugment
filename = librosa.ex('trumpet')
y, sr = librosa.load(filename)
audio_tensor = tf.reshape(tf.cast(y, tf.float32), (1, -1, 1))
input_shape = y.reshape(-1, 1).shape
melgram = kapre.composed.get_melspectrogram_layer(input_shape=input_shape,
n_fft=1024,
return_decibel=True,
n_mels=256,
input_data_format='channels_last',
output_data_format='channels_last')
Now we instantiate the SpecAugment layer. We are using an F of 5 (freq_mask_param), a T of 10 (time_mask_param), 5 frequency masks (n_freq_mask) and 3 time masks (n_time_mask). We will use a mask_value of -100.
# Now we define the SpecAugment layer
spec_augment = SpecAugment(freq_mask_param=5,
time_mask_param=10,
n_freq_mask=5,
n_time_mask=3,
mask_value=-100)
model = Sequential()
model.add(melgram)
model.add(spec_augment)
model.summary()
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spec_augment-0.0.3.tar.gz
.
File metadata
- Download URL: spec_augment-0.0.3.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 665a95d3ea651d023cb77d2508dbc45922d44a417385ea05b78ecbff65733f04 |
|
MD5 | 35ea0ecedbfc04930768a40c3f6c5b03 |
|
BLAKE2b-256 | d700410c8e05b602617fc177ed80d2821560346acce14d4b5fb6fab5a09566b9 |
File details
Details for the file spec_augment-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: spec_augment-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d8196e3ece0ea7f3a8ce6545b715e0b5b5c17426e7db58bf5fdc18bc7ba240d |
|
MD5 | 7067e4d3f31689f0791e06458ce0a8d2 |
|
BLAKE2b-256 | 867e4c03ebfc5fa25959a9292adf268cc58395d0cd7f96db48ccbc8a0cd30549 |