Audio augmentations library for PyTorch, for audio in the time-domain.

These details have not been verified by PyPI

Project links

Homepage

Project description

PyTorch Audio Augmentations

CI status

Audio data augmentations library for PyTorch for audio in the time-domain. The focus of this repository is to:

Provide many audio transformations in an easy Python interface.
Have a high test coverage.
Easily control stochastic (sequential) audio transformations.
Make every audio transformation differentiable with PyTorch's nn.Module.
Optimise audio transformations for CPU and GPU.

It supports stochastic transformations as used often in self-supervised, semi-supervised learning methods. One can apply a single stochastic augmentation or create as many stochastically transformed audio examples from a single interface.

This package follows the conventions set out by torchvision and torchaudio, with audio defined as a tensor of [channel, time], or a batched representation [batch, channel, time]. Each individual augmentation can be initialized on its own, or be wrapped around a RandomApply interface which will apply the augmentation with probability p.

Usage

We can define a single or several audio augmentations, which are applied sequentially to an audio waveform.

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.001, max_snr=0.005)], p=0.3),
    RandomApply([Gain()], p=0.2),
    HighLowPass(sample_rate=sr), # this augmentation will always be applied in this aumgentation chain!
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can also define a stochastic augmentation on multiple transformations. The following will apply both polarity inversion and white noise with a probability of 80%, a gain of 20%, and delay and reverb with a probability of 50%:

transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion(), Noise(min_snr=0.001, max_snr=0.005)], p=0.8),
    RandomApply([Gain()], p=0.2),
    RandomApply([Delay(sample_rate=sr), Reverb(sample_rate=sr)], p=0.5)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape = [num_channels, num_samples]

audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape = [4, num_channels, num_samples]

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.4

Apr 24, 2022

0.2.3

Nov 15, 2021

0.2.2

Sep 23, 2021

0.2.1

Jul 22, 2021

0.2.0

Jun 29, 2021

0.1.6

May 13, 2021

0.1.5

Mar 17, 2021

0.1.4

Mar 16, 2021

0.1.3

Mar 16, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchaudio-augmentations-0.2.4.tar.gz (10.7 kB view details)

Uploaded Apr 24, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

torchaudio_augmentations-0.2.4-py3-none-any.whl (12.2 kB view details)

Uploaded Apr 24, 2022 Python 3

File details

Details for the file torchaudio-augmentations-0.2.4.tar.gz.

File metadata

Download URL: torchaudio-augmentations-0.2.4.tar.gz
Upload date: Apr 24, 2022
Size: 10.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for torchaudio-augmentations-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`d7bd478bc3622318e74833399d4a389b95aff1c736b4502755bb7712749f3e41`
MD5	`0b9dd059ce48f50bd81b25275127d2e0`
BLAKE2b-256	`82fb01184334b9fb5bea600db90b302fcf21ffafe7910d842c4324d93b78284a`

See more details on using hashes here.

File details

Details for the file torchaudio_augmentations-0.2.4-py3-none-any.whl.

File metadata

Download URL: torchaudio_augmentations-0.2.4-py3-none-any.whl
Upload date: Apr 24, 2022
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for torchaudio_augmentations-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fcf6f2012d5f4c69d7645e23ea485fe747cf7db4df4bc6f012ff4c47db0b6fb`
MD5	`716f0685d93b2c4a1053b05b374e8e2f`
BLAKE2b-256	`997628e5a31ae863720af5e0a4864643d4fbf438575e2f40141853be48cd24bc`

See more details on using hashes here.

torchaudio-augmentations 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyTorch Audio Augmentations

Usage

Optional

Cite

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes