Skip to main content

WaveAugment performs data augmentation on audio data.

Project description

WaveAugment

WavAugment performs data augmentation on audio data. The audio data is represented as pytorch tensors.

It is particularly useful for speech data. Among others, it implements the augmentations that we found to be most useful for self-supervised learning (Data Augmenting Contrastive Learning of Speech Representations in the Time Domain, E. Kharitonov, M. Rivière, G. Synnaeve, L. Wolf, P.-E. Mazaré, M. Douze, E. Dupoux. [arxiv]):

  • Pitch randomization,
  • Reverberation,
  • Additive noise,
  • Time dropout (temporal masking),
  • Band reject,
  • Clipping

Internally, WavAugment uses libsox and allows interleaving of libsox- and pytorch-based effects.

Requirements

Installation

To install WavAugment, run the following command:

pip install waveaugment

Usage

First of all, we provide thouroughly documented examples, where we demonstrate how a data-augmented dataset interface works. We also provide a Jupyter-based tutorial (open in colab) that illlustrates how one can apply various useful effects to a piece of speech (recorded over the mic or pre-recorded).

The EffectChain

The central object is the chain of effects, EffectChain, that are applied on a torch.Tensor to produce another torch.Tensor. This chain can have multiple effects composed:

import augment
effect_chain = augment.EffectChain().pitch(100).rate(16_000)

Parameters of the effect coincide with those of libsox (http://sox.sourceforge.net/libsox.html); however, you can also randomize the parameters by providing a python Callable and mix them with standard parameters:

import numpy as np
random_pitch_shift = lambda: np.random.randint(-100, +100)
# the pitch will be changed by a shift somewhere between (-100, +100)
effect_chain = augment.EffectChain().pitch("-q", random_pitch_shift).rate(16_000)

Here, the flag-q makes pitch run faster at some expense of the quality. If some parameters are provided by a Callable, this Callable will be invoked every time EffectChain is applied (eg. to generate random parameters).

Applying the chain

To apply a chain of effects on a torch.Tensor, we code the following:

output_tensor = augment.EffectChain().pitch(100).rate(16_000).apply(input_tensor, \
    src_info=src_info, target_info=target_info)

WavAugment expects input_tensor to have a shape of (channels, length). As input_tensor does not contain important meta-information, such as sampling rate, we need to provide it manually. This is done by passing two dictionaries, src_info (meta-information about the input format) and target_info (our expectated format for the output).

At minimum, we need to set the sampling rate for the input tensor: {'rate': 16_000}.

Example usage

Below is a small gist of a potential usage:

import augment
import numpy as np

x, sr = torchaudio.load(test_wav)

# input signal properties
src_info = {'rate': sr}

# output signal properties
target_info = {'channels': 1, 
               'length': 0, # not known beforehand
               'rate': 16_000}
# write down the chain of effects with their string parameters and call .apply()
# effects are specified as a chain of method calls with parameters that can be 
# strings, numbers, or callables. The latter case is used for generating randomized
# transformations
random_pitch = lambda: np.random.randint(-400, -200)
y = augment.EffectChain().pitch(random_pitch).rate(16_000).apply(x, \
    src_info=src_info, target_info=target_info)

Important notes

It often happens that a command-line invocation of sox would change effect chain. To get a better idea of what sox executes internally, you can launch it with a -V flag, eg by running:

sox -V tests/test.wav out.wav reverb 0 50 100

we will see something like:

sox INFO sox: effects chain: input        16000Hz  1 channels
sox INFO sox: effects chain: reverb       16000Hz  2 channels
sox INFO sox: effects chain: channels     16000Hz  1 channels
sox INFO sox: effects chain: dither       16000Hz  1 channels
sox INFO sox: effects chain: output       16000Hz  1 channels

This output tells us that the reverb effect changes the number of channels, which are squashed into 1 channel by the channel effect. Sox also added dither effect to hide processing artifacts.

WavAugment remains explicit and doesn't add effects under the hood. If you want to emulate a Sox command that decomposes into several effects, we advise to consult sox -V and apply the effects manually. Try it out on some files before running a heavy machine-learning job.

Citation

If you find WavAugment useful in your research, please consider citing:

@article{wavaugment2020,
  title={Data Augmenting Contrastive Learning of Speech Representations in the Time Domain},
  author={Kharitonov, Eugene and Rivi{\`e}re, Morgane and Synnaeve, Gabriel and Wolf, Lior and Mazar{\'e}, Pierre-Emmanuel and Douze, Matthijs and Dupoux, Emmanuel},
  journal={arXiv preprint arXiv:2007.00991},
  year={2020}
}

Contributing

See the CONTRIBUTING file for how to help out.

License

WavAugment is MIT licensed, as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waveaugment-0.2.4.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

waveaugment-0.2.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file waveaugment-0.2.4.tar.gz.

File metadata

  • Download URL: waveaugment-0.2.4.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for waveaugment-0.2.4.tar.gz
Algorithm Hash digest
SHA256 c444ad50217d1065f08eeb2568a060a98090984140474efabe25c939c824e365
MD5 f96f4e372836ad21432c66a6cfa2a508
BLAKE2b-256 08f55b3d44934709639fba036997f0dbc45bb6ad5bbc1fe48176e90376277652

See more details on using hashes here.

File details

Details for the file waveaugment-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: waveaugment-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for waveaugment-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 26508053d79753e0c362c7195c6678ee48bbdeb0a807810b55411c9f7d4dce70
MD5 68d90882045100197705d02b54cd5fa3
BLAKE2b-256 9e3d8c0b2af74ddfaf824e7233729d65116b54761b892dbf714c8dfa7bbe6d7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page