Skip to main content

Audio augmentations library for PyTorch, for audio in the time-domain.

Project description

Audio Augmentations

DOI

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning. One can apply a single stochastic augmentation or create as many stochastically augmented examples from a single interface.

This package follows the conventions set out by torchaudio, with audio defined as a vector of [channel, time]. Each individual augmentation can be initialized on its own, or be wrapped around a RandomApply interface which will apply the augmentation with probability p.

Usage

We can define a single or several audio augmentations, which are applied sequentially to an audio waveform.

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    HighLowPass(sample_rate=sr), # this augmentation will always be applied in this aumgentation chain!
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can also define a stochastic augmentation on multiple transformations. The following will apply both polarity inversion and white noise with a probability of 80%, a gain of 20%, and delay and reverb with a probability of 50%:

transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion(), Noise(min_snr=0.3, max_snr=0.5)], p=0.8),
    RandomApply([Gain()], p=0.2),
    RandomApply([Delay(sample_rate=sr), Reverb(sample_rate=sr)], p=0.5)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape = [num_channels, num_samples]
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape = [4, num_channels, num_samples]

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchaudio-augmentations-0.2.1.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

torchaudio_augmentations-0.2.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file torchaudio-augmentations-0.2.1.tar.gz.

File metadata

  • Download URL: torchaudio-augmentations-0.2.1.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for torchaudio-augmentations-0.2.1.tar.gz
Algorithm Hash digest
SHA256 0175c034536597b1950964ed64752cc631ddac3bfbabcff3291a3fbae43748ea
MD5 e28da2df78058ceeb9ebde4999398451
BLAKE2b-256 c2f8235468430f24df83032057aa8aefe18a4201a9d7c7b00d5ba32bdd166438

See more details on using hashes here.

File details

Details for the file torchaudio_augmentations-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: torchaudio_augmentations-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.10

File hashes

Hashes for torchaudio_augmentations-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0422d6c16961a7eda8af0bd300da6852a5c7d491514adb931c53727484da02a3
MD5 4c11863478b6bd689e24068d1bf42875
BLAKE2b-256 ff453e2ccc3afe2215e4ce0c2fef878b4115d87ff9366c4cd49d441224ba18bf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page