Skip to main content

Waveform augmentations

Project description

speechaugs

Single-channel waveforms augmentations for speech recognition models.


Augmentations:

Tranforms in time domain:

  • Time Stretch
  • Forward Time Shift

frequency domain:

  • Pitch Shift
  • Vocal Tract Length Perturbation

Noise injection:

  • Colored Noise (white, pink, brown, blue, violet, grey)
  • Short Noises
  • File Noise

And changing the waveform samples directly:

  • Zero Samples
  • Clipping samples
  • Inversion
  • Loudness Change
  • Normalization

Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.


Installation

pip install speechaugs


Time Stretch

Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.

Forward Time Shift

Shift a waveform forwards in time.

Pitch Shift

Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.

The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.

Higher pitch (+9 semitones):

Lower pitch (-5 semitones)

Vocal Tract Length Perturbation

Change vocal tract length. Effect is very similar to Pitch Shift but speech sounds more natural.

Colored Noise

Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.

This class is implemented using colorednoise package. The color of noise is randomly choosen.

White Noise

Brown Noise

Short Noises

Add several short noises (of same color) to different parts of a waveform.

File Noise

Add noise from randomly chosen file from specified folder. Works with "sox_io" torchaudio backend. To change backend you can run:

torchaudio.set_audio_backend('sox_io')

Zero Samples

Set some percentage of samples to zero.

Clipping Samples

Clip some percentage of samples from a waveform.

Inversion

Change sign of waveform samples.

Loudness Change

Change loudness of intervals of a waveform. For example, in the figure below initial waveform was splitted into 3 intervals and samples from each of them were multiplied by different random factors.

Normalization

Normalize a waveform with choosen method ("minmax", "max" or "meanstd")


Usage example (with default parameters)

Import:

import speechaugs

Other libs:

import torch, torchaudio
import albumentations as A

Usage:

ex_waveform, sr = torchaudio.load('audio_filename')
noiseroot = 'path_to_noise_folder'

transforms = A.Compose([
    speechaugs.ForwardTimeShift(p=0.5),
    A.OneOf([speechaugs.Inversion(p=0.5), speechaugs.LoudnessChange(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ZeroSamples(p=0.5), speechaugs.ClippingSamples(p=0.5)], p=0.5),
    A.OneOf([speechaugs.TimeStretchLibrosa(p=0.5), speechaugs.PitchShiftLibrosa(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ColoredNoise(p=0.3), speechaugs.ShortNoises(p=0.3), speechaugs.FileNoise(noiseroot, p=0.3)], p=0.5),
], p=1.0)

augmented = transforms(waveform=ex_waveform)['waveform']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechaugs-0.0.11.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechaugs-0.0.11-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file speechaugs-0.0.11.tar.gz.

File metadata

  • Download URL: speechaugs-0.0.11.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.11.tar.gz
Algorithm Hash digest
SHA256 b8a760e182f2097bf92d5108cc65e915ec91651293ae3f80aa2c1694e419678e
MD5 241e8e3fb36661c4d37552afdf3f444c
BLAKE2b-256 a961f96a4088e3999759509a2bd13a887c092e879ea256b1345eaf028eda625e

See more details on using hashes here.

File details

Details for the file speechaugs-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: speechaugs-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 6264a33bb9c2bf8592fe4a2f8cf4778b25028dff7c37d369e352c66acd53f58a
MD5 c3e98fb441597c5ab8bd5a108a0b1a36
BLAKE2b-256 43445a064664f1f52ecb1bce9145c9a136c8f3f1c88d52bea5a511ffdd48d123

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page