Skip to main content

Tensorflow Layer that implements the SpecAugment technique

Project description

Theory

This library contains an implementation of SpecAugment, a simple data augmentation technique for speech recognition. Following the kapre philosophy, the implementation of this method has been encapsulated in a custom layer of Tensorflow, so that it can be incorporated into our neural network architectures directly.

Install Package

To install the package execute the following command.

pip install spec-augment

Usage

SpecAugment(freq_mask_param=5, time_mask_param=10)

SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3)

SpecAugment(freq_mask_param=5, time_mask_param=10, n_freq_mask=5, n_time_mask=3,
            mask_value=0)

Arguments

  • freq_mask_param - Frequency Mask Parameter (F in the paper)
  • time_mask_param - Time Mask Parameter (T in the paper)
  • n_freq_mask - Number of frequency masks to apply (mF in the paper). By default is 1.
  • n_time_mask - Number of time masks to apply (mT in the paper). By default is 1.
  • mask_value - Imputation value. By default is zero.

Example

SpecAugment is a technique applicable to spectrograms. In the following example, kapre is used to compute the Mel Spectrogram of a sample audio from Librosa.

import tensorflow as tf
from tensorflow.keras.models import Sequential
import librosa
import kapre
from spec_augment import SpecAugment

filename = librosa.ex('trumpet')
y, sr = librosa.load(filename)

audio_tensor = tf.reshape(tf.cast(y, tf.float32), (1, -1, 1))
input_shape = y.reshape(-1, 1).shape

melgram = kapre.composed.get_melspectrogram_layer(input_shape=input_shape,
                                                  n_fft=1024,
                                                  return_decibel=True,
                                                  n_mels=256,
                                                  input_data_format='channels_last',
                                                  output_data_format='channels_last')

Now we instantiate the SpecAugment layer. We are using an F of 5 (freq_mask_param), a T of 10 (time_mask_param), 5 frequency masks (n_freq_mask) and 3 time masks (n_time_mask). We will use a mask_value of -100.


# Now we define the SpecAugment layer
spec_augment = SpecAugment(freq_mask_param=5,
                           time_mask_param=10,
                           n_freq_mask=5,
                           n_time_mask=3,
                           mask_value=-100)                 
model = Sequential()
model.add(melgram)
model.add(spec_augment)

model.summary()

References

https://arxiv.org/abs/1904.08779

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spec_augment-0.0.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

spec_augment-0.0.3-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file spec_augment-0.0.3.tar.gz.

File metadata

  • Download URL: spec_augment-0.0.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for spec_augment-0.0.3.tar.gz
Algorithm Hash digest
SHA256 665a95d3ea651d023cb77d2508dbc45922d44a417385ea05b78ecbff65733f04
MD5 35ea0ecedbfc04930768a40c3f6c5b03
BLAKE2b-256 d700410c8e05b602617fc177ed80d2821560346acce14d4b5fb6fab5a09566b9

See more details on using hashes here.

File details

Details for the file spec_augment-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: spec_augment-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for spec_augment-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8d8196e3ece0ea7f3a8ce6545b715e0b5b5c17426e7db58bf5fdc18bc7ba240d
MD5 7067e4d3f31689f0791e06458ce0a8d2
BLAKE2b-256 867e4c03ebfc5fa25959a9292adf268cc58395d0cd7f96db48ccbc8a0cd30549

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page