Audio augmentations library for PyTorch, for audio in the time-domain.
Project description
PyTorch Audio Augmentations
Audio data augmentations library for PyTorch for audio in the time-domain. The focus of this repository is to:
- Provide many audio transformations in an easy Python interface.
- Have a high test coverage.
- Easily control stochastic (sequential) audio transformations.
- Make every audio transformation differentiable with PyTorch's
nn.Module
. - Optimise audio transformations for CPU and GPU.
It supports stochastic transformations as used often in self-supervised, semi-supervised learning methods. One can apply a single stochastic augmentation or create as many stochastically transformed audio examples from a single interface.
This package follows the conventions set out by torchvision
and torchaudio
, with audio defined as a tensor of [channel, time]
, or a batched representation [batch, channel, time]
. Each individual augmentation can be initialized on its own, or be wrapped around a RandomApply
interface which will apply the augmentation with probability p
.
Usage
We can define a single or several audio augmentations, which are applied sequentially to an audio waveform.
from audio_augmentations import *
audio, sr = torchaudio.load("tests/classical.00002.wav")
num_samples = sr * 5
transforms = [
RandomResizedCrop(n_samples=num_samples),
RandomApply([PolarityInversion()], p=0.8),
RandomApply([Noise(min_snr=0.001, max_snr=0.005)], p=0.3),
RandomApply([Gain()], p=0.2),
HighLowPass(sample_rate=sr), # this augmentation will always be applied in this aumgentation chain!
RandomApply([Delay(sample_rate=sr)], p=0.5),
RandomApply([PitchShift(
n_samples=num_samples,
sample_rate=sr
)], p=0.4),
RandomApply([Reverb(sample_rate=sr)], p=0.3)
]
We can also define a stochastic augmentation on multiple transformations. The following will apply both polarity inversion and white noise with a probability of 80%, a gain of 20%, and delay and reverb with a probability of 50%:
transforms = [
RandomResizedCrop(n_samples=num_samples),
RandomApply([PolarityInversion(), Noise(min_snr=0.001, max_snr=0.005)], p=0.8),
RandomApply([Gain()], p=0.2),
RandomApply([Delay(sample_rate=sr), Reverb(sample_rate=sr)], p=0.5)
]
We can return either one or many versions of the same audio example:
transform = Compose(transforms=transforms)
transformed_audio = transform(audio)
>> transformed_audio.shape = [num_channels, num_samples]
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape = [4, num_channels, num_samples]
Similar to the torchvision.datasets
interface, an instance of the Compose
or ComposeMany
class can be supplied to torchaudio
dataloaders that accept transform=
.
Optional
Install WavAugment for reverberation / pitch shifting:
pip install git+https://github.com/facebookresearch/WavAugment
Cite
You can cite this work with the following BibTeX:
@misc{spijkervet_torchaudio_augmentations,
doi = {10.5281/ZENODO.4748582},
url = {https://zenodo.org/record/4748582},
author = {Spijkervet, Janne},
title = {Spijkervet/torchaudio-augmentations},
publisher = {Zenodo},
year = {2021},
copyright = {MIT License}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torchaudio-augmentations-0.2.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7bd478bc3622318e74833399d4a389b95aff1c736b4502755bb7712749f3e41 |
|
MD5 | 0b9dd059ce48f50bd81b25275127d2e0 |
|
BLAKE2b-256 | 82fb01184334b9fb5bea600db90b302fcf21ffafe7910d842c4324d93b78284a |
Hashes for torchaudio_augmentations-0.2.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fcf6f2012d5f4c69d7645e23ea485fe747cf7db4df4bc6f012ff4c47db0b6fb |
|
MD5 | 716f0685d93b2c4a1053b05b374e8e2f |
|
BLAKE2b-256 | 997628e5a31ae863720af5e0a4864643d4fbf438575e2f40141853be48cd24bc |