A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Project description
Audiomentations
A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and partially multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
Need a Pytorch alternative with GPU support? Check out torch-audiomentations!
Setup
pip install audiomentations
Optional requirements
Some features have extra dependencies. Extra python package dependencies can be installed by running
pip install audiomentations[extras]
| Feature | Extra dependencies |
|---|---|
| Load 24-bit wav files fast | wavio |
LoudnessNormalization |
pyloudnorm |
Mp3Compression |
ffmpeg and [pydub or lameenc] |
Note: ffmpeg can be installed via e.g. conda or from the official ffmpeg download page.
Usage example
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np
SAMPLE_RATE = 16000
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)
# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=SAMPLE_RATE)
Go to audiomentations/augmentations/transforms.py to see the waveform transforms you can apply, and what arguments they have.
See audiomentations/augmentations/spectrogram_transforms.py for spectrogram transforms.
Waveform transforms
Some of the following waveform transforms can be visualized (for understanding) by the audio-transformation-visualization GUI (made by phrasenmaeher), where you can upload your own input wav file
AddBackgroundNoise
Added in v0.9.0
Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present.
Can also be used for mixup, as in https://arxiv.org/pdf/1710.09412.pdf
A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural.
Note that the gain of the added noise is relative to the amount of signal in the input. This implies that if the input is completely silent, no noise will be added.
Here are some examples of datasets that can be downloaded and used as background noise:
AddGaussianNoise
Added in v0.1.0
Add gaussian noise to the samples
AddGaussianSNR
Added in v0.7.0
Add gaussian noise to the samples with random Signal to Noise Ratio (SNR)
AddImpulseResponse
Added in v0.7.0
Convolve the audio with a random impulse response. Impulse responses can be created using e.g. http://tulrich.com/recording/ir_capture/
Some datasets of impulse responses are publicly available:
- EchoThief containing 115 impulse responses acquired in a wide range of locations.
- The MIT McDermott dataset containing 271 impulse responses acquired in everyday places.
Impulse responses are represented as wav files in the given ir_path.
AddShortNoises
Added in v0.9.0
Mix in various (bursts of overlapping) sounds with random pauses between. Useful if your original sound is clean and you want to simulate an environment where short noises sometimes occur.
A folder of (noise) sounds to be mixed in must be specified.
BandPassFilter
Added in v0.18.0
Apply band-pass filtering to the input audio. The filter steepness is 6 dB per octave.
Clip
Added in v0.17.0
Clip audio by specified values. e.g. set a_min=-1.0 and a_max=1.0 to ensure that no samples in the audio exceed that extent. This can be relevant for avoiding integer overflow or underflow (which results in unintended wrap distortion that can sound horrible) when exporting to e.g. 16-bit PCM wav.
Another way of ensuring that all values stay between -1.0 and 1.0 is to apply
PeakNormalization.
This transform is different from ClippingDistortion in that it takes fixed values
for clipping instead of clipping a random percentile of the samples. Arguably, this
transform is not very useful for data augmentation. Instead, think of it as a very
cheap and harsh limiter (for samples that exceed the allotted extent) that can
sometimes be useful at the end of a data augmentation pipeline.
ClippingDistortion
Added in v0.8.0
Distort signal by clipping a random percentage of points
The percentage of points that will be clipped is drawn from a uniform distribution between the two input parameters min_percentile_threshold and max_percentile_threshold. If for instance 30% is drawn, the samples are clipped if they're below the 15th or above the 85th percentile.
FrequencyMask
Added in v0.7.0
Mask some frequency band on the spectrogram. Inspired by https://arxiv.org/pdf/1904.08779.pdf
Gain
Added in v0.11.0
Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.
Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping
HighPassFilter
Added in v0.18.0
Apply low-pass filtering to the input audio. The signal will be reduced by 6 dB per octave below the cutoff frequency, so this filter is fairly gentle.
LowPassFilter
Added in v0.18.0
Apply low-pass filtering to the input audio. The signal will be reduced by 6 dB per octave above the cutoff frequency, so this filter is fairly gentle.
Mp3Compression
Added in v0.12.0
Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.
This transform depends on either lameenc or pydub/ffmpeg.
Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 hz).
Note: When using the lameenc backend, the output may be slightly longer than the input due to the fact that the LAME encoder inserts some silence at the beginning of the audio.
LoudnessNormalization
Added in v0.14.0
Apply a constant amount of gain to match a specific loudness. This is an implementation of ITU-R BS.1770-4.
Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping
Normalize
Added in v0.6.0
Apply a constant amount of gain, so that highest signal level present in the sound becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1. Also known as peak normalization.
PitchShift
Added in v0.4.0
Pitch shift the sound up or down without changing the tempo
PolarityInversion
Added in v0.11.0
Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.
Resample
Added in v0.8.0
Resample signal using librosa.core.resample
To do downsampling only set both minimum and maximum sampling rate lower than original sampling rate and vice versa to do upsampling only.
Reverse
Added in v0.18.0
Reverse the audio. Also known as time inversion. Inversion of an audio track along its time axis relates to the random flip of an image, which is an augmentation technique that is widely used in the visual domain. This can be relevant in the context of audio classification. It was successfully applied in the paper AudioCLIP: Extending CLIP to Image, Text and Audio.
Shift
Added in v0.5.0
Shift the samples forwards or backwards, with or without rollover
TanhDistortion
To be added in v0.19.0
Apply tanh (hyperbolic tangent) distortion to the audio. This technique is sometimes used for adding distortion to guitar recordings. The tanh() function can give a rounded "soft clipping" kind of distortion, and the distortion amount is proportional to the loudness of the input and the pre-gain. Tanh is symmetric, so the positive and negative parts of the signal are squashed in the same way. This transform can be useful as data augmentation because it adds harmonics. In other words, it changes the timbre of the sound.
See this page for examples: http://gdsp.hf.ntnu.no/lessons/3/17/
TimeMask
Added in v0.7.0
Make a randomly chosen part of the audio silent. Inspired by https://arxiv.org/pdf/1904.08779.pdf
TimeStretch
Added in v0.2.0
Time stretch the signal without changing the pitch
Trim
Added in v0.7.0
Trim leading and trailing silence from an audio signal using librosa.effects.trim
Spectrogram transforms
SpecChannelShuffle
Added in v0.13.0
Shuffle the channels of a multichannel spectrogram. This can help combat positional bias.
SpecFrequencyMask
Added in v0.13.0
Mask a set of frequencies in a spectrogram, Ã la Google AI SpecAugment. This type of data augmentation has proved to make speech recognition models more robust.
The masked frequencies can be replaced with either the mean of the original values or a given constant (e.g. zero).
Known limitations
- Some transforms do not support multichannel audio yet. See Multichannel audio
- Expects the input dtype to be float32, and have values between -1 and 1.
- The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
- Multiprocessing is not officially supported yet. See also #46
Contributions are welcome!
Multichannel audio
Most transforms, but not all, support 2D numpy arrays with shapes like (num_channels, num_samples)
The following table is valid for new versions of audiomentations, like >=0.18.0
| Transform | Supports multichannel audio? |
|---|---|
| AddBackgroundNoise | No, 1D only |
| AddGaussianNoise | Yes |
| AddGaussianSNR | Yes |
| AddImpulseResponse | No, 1D only |
| AddShortNoises | No, 1D only |
| BandPassFilter | No, 1D only |
| Clip | Yes |
| ClippingDistortion | Yes |
| FrequencyMask | Yes |
| Gain | Yes |
| HighPassFilter | No, 1D only |
| LoudnessNormalization | Yes, up to 5 channels |
| LowPassFilter | No, 1D only |
| Mp3Compression | No, 1D only |
| Normalize | Yes |
| PitchShift | Yes |
| PolarityInversion | Yes |
| Resample | No, 1D only |
| Reverse | Yes |
| Shift | Yes |
| SpecChannelShuffle | Yes |
| SpecFrequencyMask | Yes |
| TanhDistortion | Yes |
| TimeMask | Yes |
| TimeStretch | Yes |
| Trim | No, 1D only |
Changelog
Unreleased
v0.19.0 (2021-10-18)
Added
- Implement
TanhDistortion. Thanks to atamazian and iver56. - Add a
noise_rmsparameter toAddShortNoises. It defaults torelative, which is the old behavior.absoluteallows for adding loud noises to parts that are relatively silent in the input.
v0.18.0 (2021-08-05)
Added
- Implement
BandPassFilter,HighPassFilter,LowPassFilterandReverse. Thanks to atamazian.
v0.17.0 (2021-06-25)
Added
- Add a
fadeoption inShiftfor eliminating unwanted clicks - Add support for 32-bit int wav loading with scipy>=1.6
- Add support for float64 wav files. However, the use of this format is discouraged, since float32 is more than enough for audio in most cases.
- Implement
Clip. Thanks to atamazian. - Add some parameter sanity checks in
AddGaussianNoise - Officially support librosa 0.8.1
Changed
- Rename
AddImpulseResponsetoApplyImpulseResponse. The former will still work for now, but give a warning. - When looking for audio files in
AddImpulseResponse,AddBackgroundNoiseandAddShortNoises, follow symlinks by default. - When using the new parameters
min_snr_in_dbandmax_snr_in_dbinAddGaussianSNR, SNRs will be picked uniformly in the decibel scale instead of in the linear amplitude ratio scale. The new behavior aligns more with human hearing, which is not linear.
Fixed
- Avoid division by zero in
AddImpulseResponsewhen input is digital silence (all zeros) - Fix inverse SNR characteristics in
AddGaussianSNR. It will continue working as before unless you switch to the new parametersmin_snr_in_dbandmax_snr_in_db. If you use the old parameters, you'll get a warning.
v0.16.0 (2021-02-11)
Added
- Implement
SpecComposefor applying a pipeline of spectrogram transforms. Thanks to omerferhatt.
Fixed
- Fix a bug in
SpecChannelShufflewhere it did not support more than 3 audio channels. Thanks to omerferhatt. - Limit scipy version range to >=1.0,<1.6 to avoid issues with loading 24-bit wav files. Support for scipy>=1.6 will be added later.
v0.15.0 (2020-12-10)
Added
- Add an option
leave_length_unchangedtoAddImpulseResponse
Fixed
- Fix picklability of instances of
AddImpulseResponse,AddBackgroundNoiseandAddShortNoises
v0.14.0 (2020-12-06)
Added
- Implement
LoudnessNormalization - Implement
randomize_parametersinCompose. Thanks to SolomidHero. - Add multichannel support to
AddGaussianNoise,AddGaussianSNR,ClippingDistortion,FrequencyMask,PitchShift,Shift,TimeMaskandTimeStretch
v0.13.0 (2020-11-10)
Added
- Lay the foundation for spectrogram transforms. Implement
SpecChannelShuffleandSpecFrequencyMask. - Configurable LRU cache for transforms that use external sound files. Thanks to alumae.
- Officially add multichannel support to
Normalize
Changed
- Show a warning if a waveform had to be resampled after loading it. This is because resampling is slow. Ideally, files on disk should already have the desired sample rate.
Fixed
- Correctly find audio files with upper case filename extensions.
- Fix a bug where AddBackgroundNoise crashed when trying to add digital silence to an input. Thanks to juheeuu.
v0.12.1 (2020-09-28)
Changed
- Speed up
AddBackgroundNoise,AddShortNoisesandAddImpulseResponseby loading wav files with scipy or wavio instead of librosa.
v0.12.0 (2020-09-23)
Added
- Implement
Mp3Compression - Officially support multichannel audio in
GainandPolarityInversion - Add m4a and opus to the list of recognized audio filename extensions
Changed
- Expand range of supported
librosaversions
Removed
- Python <= 3.5 is no longer officially supported, since Python 3.5 has reached end-of-life
- Breaking change: Internal util functions are no longer exposed directly. If you were doing
e.g.
from audiomentations import calculate_rms, now you have to dofrom audiomentations.core.utils import calculate_rms
v0.11.0 (2020-08-27)
Added
- Implement
GainandPolarityInversion. Thanks to Spijkervet for the inspiration.
v0.10.1 (2020-07-27)
Changed
- Improve the performance of
AddBackgroundNoiseandAddShortNoisesby optimizing the implementation ofcalculate_rms.
Fixed
- Improve compatibility of output files written by the demo script. Thanks to xwJohn.
- Fix division by zero bug in
Normalize. Thanks to ZFTurbo.
v0.10.0 (2020-05-05)
Added
AddImpulseResponse,AddBackgroundNoiseandAddShortNoisesnow support aiff files in addition to flac, mp3, ogg and wav
Changed
- Breaking change:
AddImpulseResponse,AddBackgroundNoiseandAddShortNoisesnow include subfolders when searching for files. This is useful when your sound files are organized in subfolders.
Fixed
- Fix filter instability bug in
FrequencyMask. Thanks to kvilouras.
v0.9.0 (2020-02-20)
Added
- Remember randomized/chosen effect parameters. This allows for freezing the parameters and applying the same effect to multiple sounds. Use transform.freeze_parameters() and transform.unfreeze_parameters() for this.
- Implement transform.serialize_parameters(). Useful for when you want to store metadata on how a sound was perturbed.
- Add a rollover parameter to
Shift. This allows for introducing silence instead of a wrapped part of the sound. - Add support for flac in
AddImpulseResponse - Implement
AddBackgroundNoisetransform. Useful for when you want to add background noise to all of your sound. You need to give it a folder of background noises to choose from. - Implement
AddShortNoises. Useful for when you want to add (bursts of) short noise sounds to your input audio.
Changed
- Disregard non-audio files when looking for impulse response files
- Switch to a faster convolve implementation. This makes
AddImpulseResponsesignificantly faster. - Expand supported range of librosa versions
Fixed
- Fix a bug in
ClippingDistortionwhere the min_percentile_threshold was not respected as expected. - Improve handling of empty input
v0.8.0 (2020-01-28)
Added
- Add shuffle parameter in
Composer - Add
Resampletransformation - Add
ClippingDistortiontransformation - Add
fadeparameter toTimeMask
Thanks to askskro
v0.7.0 (2020-01-14)
Added
AddGaussianSNRAddImpulseResponseFrequencyMaskTimeMaskTrim
Thanks to karpnv
v0.6.0 (2019-05-27)
Added
- Implement peak normalization
v0.5.0 (2019-02-23)
Added
- Implement
Shifttransform
Changed
- Ensure p is within bounds
v0.4.0 (2019-02-19)
Added
- Implement
PitchShifttransform
Fixed
- Fix output dtype of
AddGaussianNoise
v0.3.0 (2019-02-19)
Added
- Implement
leave_length_unchangedinTimeStretch
v0.2.0 (2019-02-18)
Added
- Add
TimeStretchtransform - Parametrize
AddGaussianNoise
v0.1.0 (2019-02-15)
Added
- Initial release. Includes only one transform:
AddGaussianNoise
Development
Install the dependencies specified in requirements.txt
Code style
Format the code with black
Run tests and measure code coverage
pytest
Generate demo sounds for empirical evaluation
python -m demo.demo
Alternatives
Audiomentations isn't the only python library that can do various types of audio data augmentation/degradation! Here's an overview:
| Name | Github stars | License | Last commit | GPU support? |
|---|---|---|---|---|
| audio-degradation-toolbox | ||||
| audio_degrader | ||||
| audiomentations | ||||
| AugLy | ||||
| kapre | ||||
| muda | ||||
| nlpaug | ||||
| pedalboard | ||||
| pydiogment | ||||
| python-audio-effects | ||||
| sigment | ||||
| SpecAugment | ||||
| spec_augment | ||||
| torch-audiomentations | ||||
| torchaudio-augmentations | ||||
| WavAugment |
Acknowledgements
Thanks to Nomono for backing audiomentations.
Thanks to all contributors who help improving audiomentations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audiomentations-0.19.0.tar.gz.
File metadata
- Download URL: audiomentations-0.19.0.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37b1b58c9af51d49d9177213fadcb02b0b81482622e7170b87a1b477f3f32dcc
|
|
| MD5 |
e5902119218cd83d10a823ea079afc7b
|
|
| BLAKE2b-256 |
0ad5ba64838ab7258a681191a7dbc4cddcd6a69235d26309c10e83c5ddb5da96
|
File details
Details for the file audiomentations-0.19.0-py3-none-any.whl.
File metadata
- Download URL: audiomentations-0.19.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea7116365aba4ae74e6236e772ede9d9994bde17a37d24e9f3912a5ee81f73e2
|
|
| MD5 |
87746e1cc7e73842acc9f336ff752153
|
|
| BLAKE2b-256 |
0f2922a9eec7e29c5d3811216ff931c377396a22dae98581a55d4180a392512e
|