Skip to main content

Waveform augmentations

Project description

speechaugs

Single-channel waveforms augmentations for speech recognition models.


Augmentations:

  • Time Stretch
  • Forward Time Shift
  • Pitch Shift
  • Colored Noise (white, pink, brown, blue, violet, grey)
  • Zero Samples
  • Clipping samples
  • Inversion

Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.


Installation

pip install speechaugs


Time Stretch

Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.

Forward Time Shift

Shift a waveform forwards in time.

Pitch Shift

Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.

The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.

Higher pitch (+9 semitones):

Lower pitch (-5 semitones)

Colored Noise

Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.

This class is implemented using colorednoise package. The color of noise is randomly choosen.

White Noise

Brown Noise

Zero Samples

Set some percentage of samples to zero.

Clipping Samples

Clip some percentage of samples from a waveform.

Inversion

Change sign of waveform samples.


Usage example (with default parameters)

Import:

from speechaugs import TimeStretchLibrosa, ForwardTimeShift, PitchShiftLibrosa, ColoredNoise, Inversion, ZeroSamples, ClippingSamples

Other libs:

import torch, torchaudio
import albumentations as A

Usage:

ex_waveform, sr = torchaudio.load('audio_filename')

transforms = A.Compose([
    ForwardTimeShift(p=0.5),
    Inversion(p=0.5),
    A.OneOf([ZeroSamples(p=0.5), ClippingSamples(p=0.5)], p=0.5),
    A.OneOf([TimeStretchLibrosa(p=0.5), PitchShiftLibrosa(p=0.5)], p = 0.5),
    ColoredNoise(p=0.5)
], p=1.0)

augmented = transforms(waveform=ex_waveform)['waveform']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechaugs-0.0.4.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechaugs-0.0.4-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file speechaugs-0.0.4.tar.gz.

File metadata

  • Download URL: speechaugs-0.0.4.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.4.tar.gz
Algorithm Hash digest
SHA256 21087019b60f4015dfe2dfbb31a96e64390f59ae2427e71906320408d58d7dad
MD5 45ba9338577b735c8ba4fd61dd0e93b2
BLAKE2b-256 3cc53227e07158fd1ebcbfec6e73b3b6e9aa25f82107af01dbc653a87356b79b

See more details on using hashes here.

File details

Details for the file speechaugs-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: speechaugs-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a6ce62ee8015a69921b2e4fa78d588e4b4c9dd95f3975c9629451e927fc6b61c
MD5 70904dfd7964d5b2acd13167929583a3
BLAKE2b-256 4a536b0738004d82330aaa16dfdd4617e715a96459f242226a5ad343e45144bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page