Waveform augmentations

These details have not been verified by PyPI

Project links

Homepage

Project description

speechaugs

Single-channel waveforms augmentations for speech recognition models.

Augmentations:

Time Stretch
Forward Time Shift
Pitch Shift
Colored Noise (white, pink, brown, blue, violet, grey)
Zero Samples
Clipping samples
Inversion
Loudness Change
Short Noises
File Noise

Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.

Installation

pip install speechaugs

Time Stretch

Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.

Forward Time Shift

Shift a waveform forwards in time.

Pitch Shift

Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.

The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.

Higher pitch (+9 semitones):

Lower pitch (-5 semitones)

Colored Noise

Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.

This class is implemented using colorednoise package. The color of noise is randomly choosen.

White Noise

Brown Noise

Zero Samples

Set some percentage of samples to zero.

Clipping Samples

Clip some percentage of samples from a waveform.

Inversion

Change sign of waveform samples.

Loudness Change

Change loudness of intervals of a waveform. For example, in the figure below initial waveform was splitted into 3 intervals and samples from each of them were multiplied by different random factors.

Short Noises

Add several short noises (of same color) to different parts of a waveform.

File Noise

Add noise from randomly chosen file from specified folder. Works with "sox_io" torchaudio backend. To change backend you can run:

torchaudio.set_audio_backend('sox_io')

Usage example (with default parameters)

Import:

import speechaugs

Other libs:

import torch, torchaudio
import albumentations as A

Usage:

ex_waveform, sr = torchaudio.load('audio_filename')
noiseroot = 'path_to_noise_folder'

transforms = A.Compose([
    speechaugs.ForwardTimeShift(p=0.5),
    A.OneOf([speechaugs.Inversion(p=0.5), speechaugs.LoudnessChange(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ZeroSamples(p=0.5), speechaugs.ClippingSamples(p=0.5)], p=0.5),
    A.OneOf([speechaugs.TimeStretchLibrosa(p=0.5), speechaugs.PitchShiftLibrosa(p=0.5)], p=0.5),
    A.OneOf([speechaugs.ColoredNoise(p=0.3), speechaugs.ShortNoises(p=0.3), speechaugs.FileNoise(noiseroot, p=0.3)], p=0.5),
], p=1.0)

augmented = transforms(waveform=ex_waveform)['waveform']

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.11

Feb 8, 2021

This version

0.0.10

Jan 29, 2021

0.0.6

Jan 27, 2021

0.0.5

Jan 25, 2021

0.0.4

Jan 18, 2021

0.0.3

Jan 15, 2021

0.0.2

Jan 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechaugs-0.0.10.tar.gz (8.1 kB view details)

Uploaded Jan 29, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speechaugs-0.0.10-py3-none-any.whl (19.0 kB view details)

Uploaded Jan 29, 2021 Python 3

File details

Details for the file speechaugs-0.0.10.tar.gz.

File metadata

Download URL: speechaugs-0.0.10.tar.gz
Upload date: Jan 29, 2021
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for speechaugs-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`13fbee3205faa5d19bbd79fb9f94c631e057c120b3d7e5ae580e58577697262f`
MD5	`6ed80196fe4ec0723efc497ed74afb37`
BLAKE2b-256	`9d70d0975decc3f292acf954ddf866556f3146c4ff359eac824c0ba56350a95d`

See more details on using hashes here.

File details

Details for the file speechaugs-0.0.10-py3-none-any.whl.

File metadata

Download URL: speechaugs-0.0.10-py3-none-any.whl
Upload date: Jan 29, 2021
Size: 19.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for speechaugs-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85862e8e387bd4a4cae33a73c1f27743d74db12af7677517d8c2a518b82db384`
MD5	`2b47d7bb2a1b03966ffb84acb715bc5c`
BLAKE2b-256	`bf9cbccd795b8cddd5ccce84b792680efc601656226c5138dd9c57f3e516e5fa`

See more details on using hashes here.

speechaugs 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

speechaugs

Augmentations:

Installation

Time Stretch

Forward Time Shift

Pitch Shift

Colored Noise

Zero Samples

Clipping Samples

Inversion

Loudness Change

Short Noises

File Noise

Usage example (with default parameters)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes