Waveform augmentations
Project description
speechaugs
Single-channel waveforms augmentations for speech recognition models.
Augmentations:
- Time Stretch
- Forward Time Shift
- Pitch Shift
- Colored Noise (white, pink, brown, blue, violet, grey)
- Zero Samples
- Clipping samples
- Inversion
- Loudness Change
- Short Noises
- File Noise
Colab Example You can see examples of all augmentations and listen to resulting audios on this page with Colab notebook.
Installation
pip install speechaugs
Time Stretch
Stretch a wavefom in time with randomly chosen rate. Is implemented using librosa.effects.time_stretch.
Forward Time Shift
Shift a waveform forwards in time.
Pitch Shift
Shift a pitch by n_steps semitones. Is implemented using librosa.effects.pitch_shift.
The work of PitchShift can be better illustrated on the MelSpectrograms of waveforms.
Higher pitch (+9 semitones):
Lower pitch (-5 semitones)
Colored Noise
Add noise of different color to a waveform. Color of noise depends on the spectral density of the noise. You can go to wiki page for more information.
This class is implemented using colorednoise package. The color of noise is randomly choosen.
White Noise
Brown Noise
Zero Samples
Set some percentage of samples to zero.
Clipping Samples
Clip some percentage of samples from a waveform.
Inversion
Change sign of waveform samples.
Loudness Change
Change loudness of intervals of a waveform. For example, in the figure below initial waveform was splitted into 3 intervals and samples from each of them were multiplied by different random factors.
Short Noises
Add several short noises (of same color) to different parts of a waveform.
File Noise
Add noise from randomly chosen file from specified folder. Works with "sox_io" torchaudio backend. To change backend you can run:
torchaudio.set_audio_backend('sox_io')
Usage example (with default parameters)
Import:
import speechaugs
Other libs:
import torch, torchaudio
import albumentations as A
Usage:
ex_waveform, sr = torchaudio.load('audio_filename')
noiseroot = 'path_to_noise_folder'
transforms = A.Compose([
speechaugs.ForwardTimeShift(p=0.5),
A.OneOf([speechaugs.Inversion(p=0.5), speechaugs.LoudnessChange(p=0.5)], p=0.5),
A.OneOf([speechaugs.ZeroSamples(p=0.5), speechaugs.ClippingSamples(p=0.5)], p=0.5),
A.OneOf([speechaugs.TimeStretchLibrosa(p=0.5), speechaugs.PitchShiftLibrosa(p=0.5)], p=0.5),
A.OneOf([speechaugs.ColoredNoise(p=0.3), speechaugs.ShortNoises(p=0.3), speechaugs.FileNoise(noiseroot, p=0.3)], p=0.5),
], p=1.0)
augmented = transforms(waveform=ex_waveform)['waveform']
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechaugs-0.0.10.tar.gz.
File metadata
- Download URL: speechaugs-0.0.10.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13fbee3205faa5d19bbd79fb9f94c631e057c120b3d7e5ae580e58577697262f
|
|
| MD5 |
6ed80196fe4ec0723efc497ed74afb37
|
|
| BLAKE2b-256 |
9d70d0975decc3f292acf954ddf866556f3146c4ff359eac824c0ba56350a95d
|
File details
Details for the file speechaugs-0.0.10-py3-none-any.whl.
File metadata
- Download URL: speechaugs-0.0.10-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2.post20210110 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85862e8e387bd4a4cae33a73c1f27743d74db12af7677517d8c2a518b82db384
|
|
| MD5 |
2b47d7bb2a1b03966ffb84acb715bc5c
|
|
| BLAKE2b-256 |
bf9cbccd795b8cddd5ccce84b792680efc601656226c5138dd9c57f3e516e5fa
|