Skip to main content

Audio augmentations library for PyTorch, for audio in the time-domain.

Project description

Audio Augmentations

DOI

Audio augmentations library for PyTorch for audio in the time-domain, with support for stochastic data augmentations as used often in self-supervised / contrastive learning.

Usage

We can define several audio augmentations, which will be applied sequentially to a raw audio waveform:

from audio_augmentations import *

audio, sr = torchaudio.load("tests/classical.00002.wav")

num_samples = sr * 5
transforms = [
    RandomResizedCrop(n_samples=num_samples),
    RandomApply([PolarityInversion()], p=0.8),
    RandomApply([Noise(min_snr=0.3, max_snr=0.5)], p=0.3),
    RandomApply([Gain()], p=0.2),
    RandomApply([HighLowPass(sample_rate=sr)], p=0.8),
    RandomApply([Delay(sample_rate=sr)], p=0.5),
    RandomApply([PitchShift(
        n_samples=num_samples,
        sample_rate=sr
    )], p=0.4),
    RandomApply([Reverb(sample_rate=sr)], p=0.3)
]

We can return either one or many versions of the same audio example:

transform = Compose(transforms=transforms)
transformed_audio =  transform(audio)
>> transformed_audio.shape[0] = 1
audio = torchaudio.load("testing/classical.00002.wav")
transform = ComposeMany(transforms=transforms, num_augmented_samples=4)
transformed_audio = transform(audio)
>> transformed_audio.shape[0] = 4

Similar to the torchvision.datasets interface, an instance of the Compose or ComposeMany class can be supplied to a torchaudio dataloaders that accept transform=.

Optional

Install WavAugment for reverberation / pitch shifting:

pip install git+https://github.com/facebookresearch/WavAugment

Cite

You can cite this work with the following BibTeX:

@misc{spijkervet_torchaudio_augmentations,
  doi = {10.5281/ZENODO.4748582},
  url = {https://zenodo.org/record/4748582},
  author = {Spijkervet,  Janne},
  title = {Spijkervet/torchaudio-augmentations},
  publisher = {Zenodo},
  year = {2021},
  copyright = {MIT License}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchaudio-augmentations-0.1.6.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

torchaudio_augmentations-0.1.6-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file torchaudio-augmentations-0.1.6.tar.gz.

File metadata

  • Download URL: torchaudio-augmentations-0.1.6.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.9

File hashes

Hashes for torchaudio-augmentations-0.1.6.tar.gz
Algorithm Hash digest
SHA256 3c7ff0a34a846094d9ed67b2394f3427aadf39f5c48cd7cbe4c52bc4c799febf
MD5 e4b999da41b7c44d68f7e979cdf98acf
BLAKE2b-256 2fe4ce551522574693fd92b5e4d31fc5b7309af7b45c89acdfdec38fe52717f3

See more details on using hashes here.

File details

Details for the file torchaudio_augmentations-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: torchaudio_augmentations-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.9

File hashes

Hashes for torchaudio_augmentations-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d9dc0e68b94cd134dc747d5d2df4910097b9b633166f3d2c3dcba334c31cfe
MD5 c4112098181068278293291ce13ce856
BLAKE2b-256 5c3e69d659535e244b370637bf32e48dd58b35914161f430bf7f97e8d2fbdc36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page