Skip to main content

Waveform augmentations

Project description

speechaugs

Single-channel waveforms augmentations for speech recognition models.


Augmentations:

  • Time Stretch
  • Forward Time Shift
  • Pitch Shift
  • Colored Noise (white, pink, brown, blue, violet, grey)
  • Zero Samples
  • Clipping samples
  • Inversion
  • Loudness Change
  • Short Noises
  • File Noise

Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.


Installation

pip install speechaugs


Time Stretch

Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.

Forward Time Shift

Shift a waveform forwards in time.

Pitch Shift

Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.

The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.

Higher pitch (+9 semitones):

Lower pitch (-5 semitones)

Colored Noise

Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.

This class is implemented using colorednoise package. The color of noise is randomly choosen.

White Noise

Brown Noise

Zero Samples

Set some percentage of samples to zero.

Clipping Samples

Clip some percentage of samples from a waveform.

Inversion

Change sign of waveform samples.

Loudness Change

Change loudness of intervals of a waveform. For example, in the figure below initial waveform was splitted into 3 intervals and samples from each of them were multiplied by different random factors.

Short Noises

Add several short noises to different parts of a waveform.

File Noise

Add noise from randomly chosen file from specified folder.


Usage example (with default parameters)

Import:

from speechaugs import TimeStretchLibrosa, ForwardTimeShift, PitchShiftLibrosa, ColoredNoise, Inversion, ZeroSamples, ClippingSamples

Other libs:

import torch, torchaudio
import albumentations as A

Usage:

ex_waveform, sr = torchaudio.load('audio_filename')

transforms = A.Compose([
    ForwardTimeShift(p=0.5),
    Inversion(p=0.5),
    A.OneOf([ZeroSamples(p=0.5), ClippingSamples(p=0.5)], p=0.5),
    A.OneOf([TimeStretchLibrosa(p=0.5), PitchShiftLibrosa(p=0.5)], p = 0.5),
    ColoredNoise(p=0.5)
], p=1.0)

augmented = transforms(waveform=ex_waveform)['waveform']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechaugs-0.0.6.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechaugs-0.0.6-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file speechaugs-0.0.6.tar.gz.

File metadata

  • Download URL: speechaugs-0.0.6.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for speechaugs-0.0.6.tar.gz
Algorithm Hash digest
SHA256 9c0e9db90783919969e54321cb673068622ce9d35e8596ed4a817230f094eadd
MD5 2dbdd2ed0baa0a96a1d24de7ec179cbf
BLAKE2b-256 fb0bd795aedabaeb0470e101aa49cc9a662db5201df62ecb1b8130c80836c872

See more details on using hashes here.

File details

Details for the file speechaugs-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: speechaugs-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for speechaugs-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e90d729dc7444f02e8c28bb431af62a349b488d4fa770b24dee1be6d9fd2c411
MD5 00e24764f5091959f2b2ee810782d5fb
BLAKE2b-256 9607b77f1ce19a2a05f01ac227c7704272f72432c50467b4ee22da0680f89d98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page