Waveform augmentations

These details have not been verified by PyPI

Project links

Homepage

Project description

speechaugs

Single-channel waveforms augmentations for speech recognition models.

Augmentations:

Tranforms in time domain:

Time Stretch
Forward Time Shift

frequency domain:

Pitch Shift
Vocal Tract Length Perturbation

Noise injection:

Colored Noise (white, pink, brown, blue, violet, grey)
Short Noises
File Noise

And changing the waveform samples directly:

Zero Samples
Clipping samples
Inversion
Loudness Change
Normalization

Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.

Installation

pip install speechaugs

Time Stretch

Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.

Forward Time Shift

Shift a waveform forwards in time.

Pitch Shift

Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.

The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.

Higher pitch (+9 semitones):

Lower pitch (-5 semitones)

Vocal Tract Length Perturbation

Change vocal tract length. Effect is very similar to Pitch Shift but speech sounds more natural.

Colored Noise

Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.

This class is implemented using colorednoise package. The color of noise is randomly choosen.

White Noise

Brown Noise

Short Noises

Add several short noises (of same color) to different parts of a waveform.

File Noise

Add noise from randomly chosen file from specified folder. Works with "sox_io" torchaudio backend. To change backend you can run:

torchaudio.set_audio_backend('sox_io')

Zero Samples

Set some percentage of samples to zero.

Clipping Samples

Clip some percentage of samples from a waveform.

Inversion

Change sign of waveform samples.

Loudness Change

Change loudness of intervals of a waveform. For example, in the figure below initial waveform was splitted into 3 intervals and samples from each of them were multiplied by different random factors.

Normalization

Normalize a waveform with choosen method ("minmax", "max" or "meanstd")

Usage example (with default parameters)

Import:

import speechaugs

Other libs:

import torch, torchaudio
import albumentations as A

Usage:

ex_waveform, sr = torchaudio.load('audio_filename')
noiseroot = 'path_to_noise_folder'

transforms = A.Compose([
    speechaugs.ForwardTimeShift(p=0.5),
    A.OneOf([speechaugs.Inversion(p=0.5), speechaugs.LoudnessChange(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ZeroSamples(p=0.5), speechaugs.ClippingSamples(p=0.5)], p=0.5),
    A.OneOf([speechaugs.TimeStretchLibrosa(p=0.5), speechaugs.PitchShiftLibrosa(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ColoredNoise(p=0.3), speechaugs.ShortNoises(p=0.3), speechaugs.FileNoise(noiseroot, p=0.3)], p=0.5),
], p=1.0)

augmented = transforms(waveform=ex_waveform)['waveform']

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.11

Feb 8, 2021

0.0.10

Jan 29, 2021

0.0.6

Jan 27, 2021

0.0.5

Jan 25, 2021

0.0.4

Jan 18, 2021

0.0.3

Jan 15, 2021

0.0.2

Jan 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechaugs-0.0.11.tar.gz (9.2 kB view details)

Uploaded Feb 8, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speechaugs-0.0.11-py3-none-any.whl (7.8 kB view details)

Uploaded Feb 8, 2021 Python 3

File details

Details for the file speechaugs-0.0.11.tar.gz.

File metadata

Download URL: speechaugs-0.0.11.tar.gz
Upload date: Feb 8, 2021
Size: 9.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`b8a760e182f2097bf92d5108cc65e915ec91651293ae3f80aa2c1694e419678e`
MD5	`241e8e3fb36661c4d37552afdf3f444c`
BLAKE2b-256	`a961f96a4088e3999759509a2bd13a887c092e879ea256b1345eaf028eda625e`

See more details on using hashes here.

File details

Details for the file speechaugs-0.0.11-py3-none-any.whl.

File metadata

Download URL: speechaugs-0.0.11-py3-none-any.whl
Upload date: Feb 8, 2021
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/39.1.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.5

File hashes

Hashes for speechaugs-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6264a33bb9c2bf8592fe4a2f8cf4778b25028dff7c37d369e352c66acd53f58a`
MD5	`c3e98fb441597c5ab8bd5a108a0b1a36`
BLAKE2b-256	`43445a064664f1f52ecb1bce9145c9a136c8f3f1c88d52bea5a511ffdd48d123`

See more details on using hashes here.

speechaugs 0.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

speechaugs

Augmentations:

Tranforms in time domain:

frequency domain:

Noise injection:

And changing the waveform samples directly:

Installation

Time Stretch

Forward Time Shift

Pitch Shift

Vocal Tract Length Perturbation

Colored Noise

Short Noises

File Noise

Zero Samples

Clipping Samples

Inversion

Loudness Change

Normalization

Usage example (with default parameters)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes