A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

These details have not been verified by PyPI

Project links

Homepage

Project description

Audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support

pip install audiomentations

Optional requirements

Some features have extra dependencies. Extra python package dependencies can be installed by running

pip install audiomentations[extras]

Feature	Extra dependencies
`LoudnessNormalization`	`pyloudnorm`
`Mp3Compression`	`ffmpeg` and [`pydub` or `lameenc`]
`RoomSimulator`	`pyroomacoustics`

Note: ffmpeg can be installed via e.g. conda or from the official ffmpeg download page.

Usage example

Waveform

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Check out the source code at audiomentations/augmentations/ to see the waveform transforms you can apply, and what arguments they have.

Spectrogram

from audiomentations import SpecCompose, SpecChannelShuffle, SpecFrequencyMask
import numpy as np

augment = SpecCompose(
    [
        SpecChannelShuffle(p=0.5),
        SpecFrequencyMask(p=0.5),
    ]
)

# Example spectrogram with 1025 frequency bins, 256 time steps and 2 audio channels
spectrogram = np.random.random((1025, 256, 2))

# Augment/transform/perturb the spectrogram
augmented_spectrogram = augment(spectrogram)

See audiomentations/spec_augmentations/spectrogram_transforms.py for spectrogram transforms.

Waveform transforms

Some of the following waveform transforms can be visualized (for understanding) by the audio-transformation-visualization GUI (made by phrasenmaeher), where you can upload your own input wav file

`AddBackgroundNoise`

Added in v0.9.0

Mix in another sound, e.g. a background noise. Useful if your original sound is clean and you want to simulate an environment where background noise is present.

Can also be used for mixup, as in https://arxiv.org/pdf/1710.09412.pdf

A folder of (background noise) sounds to be mixed in must be specified. These sounds should ideally be at least as long as the input sounds to be transformed. Otherwise, the background sound will be repeated, which may sound unnatural.

Note that the gain of the added noise is relative to the amount of signal in the input. This implies that if the input is completely silent, no noise will be added.

Here are some examples of datasets that can be downloaded and used as background noise:

`AddGaussianNoise`

Added in v0.1.0

Add gaussian noise to the samples

`AddGaussianSNR`

Added in v0.7.0

Add gaussian noise to the input. A random Signal to Noise Ratio (SNR) will be picked uniformly in the decibel scale. This aligns with human hearing, which is more logarithmic than linear.

`ApplyImpulseResponse`

Added in v0.7.0

Convolve the audio with a random impulse response. Impulse responses can be created using e.g. http://tulrich.com/recording/ir_capture/

Some datasets of impulse responses are publicly available:

EchoThief containing 115 impulse responses acquired in a wide range of locations.
The MIT McDermott dataset containing 271 impulse responses acquired in everyday places.

Impulse responses are represented as wav files in the given ir_path.

`AddShortNoises`

Added in v0.9.0

Mix in various (bursts of overlapping) sounds with random pauses between. Useful if your original sound is clean and you want to simulate an environment where short noises sometimes occur.

A folder of (noise) sounds to be mixed in must be specified.

`AirAbsorption`

Added in v0.25.0

Apply a Lowpass-like filterbank with variable octave attenuation that simulates attenuation of higher frequencies due to air absorption in some cases (10-20 degrees Celsius temperature and 30-90% humidity).

`BandPassFilter`

Added in v0.18.0, updated in v0.21.0

Apply band-pass filtering to the input audio. Filter steepness (6/12/18... dB / octave) is parametrized. Can also be set for zero-phase filtering (will result in a 6db drop at cutoffs).

`BandStopFilter`

Added in v0.21.0

Apply band-stop filtering to the input audio. Also known as notch filter or band reject filter. It relates to the frequency mask idea in the SpecAugment paper. This transform is similar to FrequencyMask, but has overhauled default parameters and parameter randomization - center frequency gets picked in mel space so it is more aligned with human hearing, which is not linear. Filter steepness (6/12/18... dB / octave) is parametrized. Can also be set for zero-phase filtering (will result in a 6db drop at cutoffs).

`Clip`

Added in v0.17.0

Clip audio by specified values. e.g. set a_min=-1.0 and a_max=1.0 to ensure that no samples in the audio exceed that extent. This can be relevant for avoiding integer overflow or underflow (which results in unintended wrap distortion that can sound horrible) when exporting to e.g. 16-bit PCM wav.

Another way of ensuring that all values stay between -1.0 and 1.0 is to apply PeakNormalization.

This transform is different from ClippingDistortion in that it takes fixed values for clipping instead of clipping a random percentile of the samples. Arguably, this transform is not very useful for data augmentation. Instead, think of it as a very cheap and harsh limiter (for samples that exceed the allotted extent) that can sometimes be useful at the end of a data augmentation pipeline.

`ClippingDistortion`

Added in v0.8.0

Distort signal by clipping a random percentage of points

The percentage of points that will be clipped is drawn from a uniform distribution between the two input parameters min_percentile_threshold and max_percentile_threshold. If for instance 30% is drawn, the samples are clipped if they're below the 15th or above the 85th percentile.

`FrequencyMask`

Added in v0.7.0

Mask some frequency band on the spectrogram. Inspired by https://arxiv.org/pdf/1904.08779.pdf

`Gain`

Added in v0.11.0

Multiply the audio by a random amplitude factor to reduce or increase the volume. This technique can help a model become somewhat invariant to the overall gain of the input audio.

Warning: This transform can return samples outside the [-1, 1] range, which may lead to clipping or wrap distortion, depending on what you do with the audio in a later stage. See also https://en.wikipedia.org/wiki/Clipping_(audio)#Digital_clipping

`GainTransition`

Added in v0.22.0

Gradually change the volume up or down over a random time span. Also known as fade in and fade out. The fade works on a logarithmic scale, which is natural to human hearing.

The way this works is that it picks two gains: a first gain and a second gain. Then it picks a time range for the transition between those two gains. Note that this transition can start before the audio starts and/or end after the audio ends, so the output audio can start or end in the middle of a transition. The gain starts at the first gain and is held constant until the transition start. Then it transitions to the second gain. Then that gain is held constant until the end of the sound.

`HighPassFilter`

Added in v0.18.0, updated in v0.21.0

Apply high-pass filtering to the input audio of parametrized filter steepness (6/12/18... dB / octave). Can also be set for zero-phase filtering (will result in a 6db drop at cutoff).

`HighShelfFilter`

Added in v0.21.0

A high shelf filter is a filter that either boosts (increases amplitude) or cuts (decreases amplitude) frequencies above a certain center frequency. This transform applies a high-shelf filter at a specific center frequency in hertz. The gain at nyquist frequency is controlled by {min,max}_gain_db (note: can be positive or negative!). Filter coefficients are taken from the W3 Audio EQ Cookbook

`LowPassFilter`

Added in v0.18.0, updated in v0.21.0

Apply low-pass filtering to the input audio of parametrized filter steepness (6/12/18... dB / octave). Can also be set for zero-phase filtering (will result in a 6db drop at cutoff).

`LowShelfFilter`

Added in v0.21.0

A low shelf filter is a filter that either boosts (increases amplitude) or cuts (decreases amplitude) frequencies below a certain center frequency. This transform applies a low-shelf filter at a specific center frequency in hertz. The gain at DC frequency is controlled by {min,max}_gain_db (note: can be positive or negative!). Filter coefficients are taken from the W3 Audio EQ Cookbook

`Mp3Compression`

Added in v0.12.0

Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.

This transform depends on either lameenc or pydub/ffmpeg.

Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 hz).

Note: When using the lameenc backend, the output may be slightly longer than the input due to the fact that the LAME encoder inserts some silence at the beginning of the audio.

`Lambda`

To be released in v0.26.0

Apply a user-defined transform (callable) to the signal.

`Limiter`

To be released in v0.26.0

A simple audio limiter (dynamic range compression). Note: This transform also delays the signal by a fraction of the attack time.

`LoudnessNormalization`

Added in v0.14.0

Apply a constant amount of gain to match a specific loudness. This is an implementation of ITU-R BS.1770-4.

`Normalize`

Added in v0.6.0

Apply a constant amount of gain, so that highest signal level present in the sound becomes 0 dBFS, i.e. the loudest level allowed if all samples must be between -1 and 1. Also known as peak normalization.

`Padding`

Added in v0.23.0

Apply padding to the audio signal - take a fraction of the end or the start of the audio and replace that part with padding. This can be useful for preparing ML models with constant input length for padded inputs.

`PeakingFilter`

Added in v0.21.0

Add a biquad peaking filter transform

`PitchShift`

Added in v0.4.0

Pitch shift the sound up or down without changing the tempo

`PolarityInversion`

Added in v0.11.0

Flip the audio samples upside-down, reversing their polarity. In other words, multiply the waveform by -1, so negative values become positive, and vice versa. The result will sound the same compared to the original when played back in isolation. However, when mixed with other audio sources, the result may be different. This waveform inversion technique is sometimes used for audio cancellation or obtaining the difference between two waveforms. However, in the context of audio data augmentation, this transform can be useful when training phase-aware machine learning models.

`Resample`

Added in v0.8.0

Resample signal using librosa.core.resample

To do downsampling only set both minimum and maximum sampling rate lower than original sampling rate and vice versa to do upsampling only.

`Reverse`

Added in v0.18.0

Reverse the audio. Also known as time inversion. Inversion of an audio track along its time axis relates to the random flip of an image, which is an augmentation technique that is widely used in the visual domain. This can be relevant in the context of audio classification. It was successfully applied in the paper AudioCLIP: Extending CLIP to Image, Text and Audio.

`RoomSimulator`

Added in v0.23.0

A ShoeBox Room Simulator. Simulates a cuboid of parametrized size and average surface absorption coefficient. It also includes a source and microphones in parametrized locations.

Use it when you want a ton of synthetic room impulse responses of specific configurations characteristics or simply to quickly add reverb for augmentation purposes

`SevenBandParametricEQ`

Added in v0.24.0

Adjust the volume of different frequency bands. This transform is a 7-band parametric equalizer - a combination of one low shelf filter, five peaking filters and one high shelf filter, all with randomized gains, Q values and center frequencies.

Because this transform changes the timbre, but keeps the overall "class" of the sound the same (depending on application), it can be used for data augmentation to make ML models more robust to various frequency spectrums. Many things can affect the spectrum, like room acoustics, any objects between the microphone and the sound source, microphone type/model and the distance between the sound source and the microphone.

The seven bands have center frequencies picked in the following ranges (min-max): 42-95 hz 91-204 hz 196-441 hz 421-948 hz 909-2045 hz 1957-4404 hz 4216-9486 hz

`Shift`

Added in v0.5.0

Shift the samples forwards or backwards, with or without rollover

`TanhDistortion`

Added in v0.19.0

Apply tanh (hyperbolic tangent) distortion to the audio. This technique is sometimes used for adding distortion to guitar recordings. The tanh() function can give a rounded "soft clipping" kind of distortion, and the distortion amount is proportional to the loudness of the input and the pre-gain. Tanh is symmetric, so the positive and negative parts of the signal are squashed in the same way. This transform can be useful as data augmentation because it adds harmonics. In other words, it changes the timbre of the sound.

See this page for examples: http://gdsp.hf.ntnu.no/lessons/3/17/

`TimeMask`

Added in v0.7.0

Make a randomly chosen part of the audio silent. Inspired by https://arxiv.org/pdf/1904.08779.pdf

`TimeStretch`

Added in v0.2.0

Time stretch the signal without changing the pitch

`Trim`

Added in v0.7.0

Trim leading and trailing silence from an audio signal using librosa.effects.trim

Spectrogram transforms

`SpecChannelShuffle`

Added in v0.13.0

Shuffle the channels of a multichannel spectrogram. This can help combat positional bias.

`SpecFrequencyMask`

Added in v0.13.0

Mask a set of frequencies in a spectrogram, Ã la Google AI SpecAugment. This type of data augmentation has proved to make speech recognition models more robust.

The masked frequencies can be replaced with either the mean of the original values or a given constant (e.g. zero).

Composition classes

`Compose`

Compose applies the given sequence of transforms when called, optionally shuffling the sequence for every call.

`SpecCompose`

Same as Compose, but for spectrogram transforms

`OneOf`

OneOf randomly picks one of the given transforms when called, and applies that transform.

`SomeOf`

SomeOf randomly picks several of the given transforms when called, and applies those transforms.

Known limitations

A few transforms do not support multichannel audio yet. See Multichannel audio
Expects the input dtype to be float32, and have values between -1 and 1.
The code runs on CPU, not GPU. For a GPU-compatible version, check out pytorch-audiomentations
Multiprocessing probably works but is not officially supported yet

Contributions are welcome!

Multichannel audio

As of v0.22.0, all transforms except AddBackgroundNoise and AddShortNoises support not only mono audio (1-dimensional numpy arrays), but also stereo audio, i.e. 2D arrays with shape like (num_channels, num_samples)

Changelog

Unreleased

v0.26.0 (2022-08-19)

Added

Add new transform Lambda. Thanks to Thanatoz-1.
Add new transform Limiter. Thanks to pzelasko.

Fixed

Fix incorrect type hints in RoomSimulator
Make Shift robust to different sample rate inputs when parameters are frozen

v0.25.1 (2022-06-15)

Fixed

Fix a bug where RoomSimulator would treat an x value as if it was y, and vice versa

v0.25.0 (2022-05-30)

Added

Add AirAbsorption transform
Add mp4 to the list of recognized audio filename extensions

Changed

Guard against invalid params in TimeMask
Emit FutureWarning instead of UserWarning in Trim and ApplyImpulseResponse
Allow specifying a file path, a folder path, a list of files or a list of folders to ApplyImpulseResponse, AddBackgroundNoise and AddShortNoises. Previously only a path to a folder was allowed.

Fixed

Fix a bug with noise_transform in AddBackgroundNoise where some SNR calculations were done before the noise_transform was applied. This has sometimes led to incorrect SNR in the output. This changes the behavior of AddBackgroundNoise (when noise_transform is used).

Removed

Remove support for Python 3.6, as it is past its end of life already. RIP.

v0.24.0 (2022-03-18)

Added

Add SevenBandParametricEQ transform
Add optional noise_transform in AddShortNoises
Add .aac and .aif to the list of recognized audio filename endings

Changed

Show warning if top_db and/or p in Trim are not specified because their default values will change in a future version

Fixed

Fix filter instability bug related to center freq above nyquist freq in LowShelfFilter and HighShelfFilter

v0.23.0 (2022-03-07)

Added

Add Padding transform
Add RoomSimulator transform for simulating shoebox rooms using pyroomacoustics
Add parameter signal_gain_in_db_during_noise in AddShortNoises

Changed

Not specifying a value for leave_length_unchanged in AddImpulseResponse now emits a warning, as the default value will change from False to True in a future version.

Removed

Remove the deprecated AddImpulseResponse alias. Use ApplyImpulseResponse instead.
Remove support for the legacy parameters min_SNR and max_SNR in AddGaussianSNR
Remove useless default path value in AddBackgroundNoise, AddShortNoises and ApplyImpulseResponse

v0.22.0 (2022-02-18)

Added

Implement GainTransition
Add support for librosa 0.9
Add support for stereo audio in Mp3Compression, Resample and Trim
Add "relative_to_whole_input" option for noise_rms parameter in AddShortNoises
Add optional noise_transform in AddBackgroundNoise

Changed

Improve speed of PitchShift by 6-18% when the input audio is stereo

Removed

Remove support for librosa<=0.7.2

v0.21.0 (2022-02-10)

Added

Add support for multichannel audio in ApplyImpulseResponse, BandPassFilter, HighPassFilter and LowPassFilter
Add BandStopFilter (similar to FrequencyMask, but with overhauled defaults and parameter randomization behavior), PeakingFilter, LowShelfFilter and HighShelfFilter
Add parameter add_all_noises_with_same_level in AddShortNoises

Changed

Change BandPassFilter, LowPassFilter, HighPassFilter, to use scipy's butterworth filters instead of pydub. Now they have parametrized roll-off. Filters are now steeper than before by default - set min_rolloff=6, max_rolloff=6 to get the old behavior. They also support zero-phase filtering now. And they're at least ~25x times faster than before!

Removed

Remove optional wavio dependency for audio loading

v0.20.0 (2021-11-18)

Added

Implement OneOf and SomeOf for applying one of or some of many transforms. Transforms are randomly chosen every call. Inspired by augly. Thanks to Cangonin and iver56.
Add a new argument apply_to_children (bool) in randomize_parameters, freeze_parameters and unfreeze_parameters in Compose and SpecCompose.

Changed

Insert three new parameters in AddBackgroundNoise: noise_rms (defaults to "relative", which is the old behavior), min_absolute_rms_in_db and max_absolute_rms_in_db. This may be a breaking change if you used AddBackgroundNoise with positional arguments in earlier versions of audiomentations! Please use keyword arguments to be on the safe side - it should be backwards compatible then.

Fixed

Remove global pydub import which was accidentally introduced in v0.18.0. pydub is considered an optional dependency and is imported only on demand now.

v0.19.0 (2021-10-18)

Added

Implement TanhDistortion. Thanks to atamazian and iver56.
Add a noise_rms parameter to AddShortNoises. It defaults to relative, which is the old behavior. absolute allows for adding loud noises to parts that are relatively silent in the input.

v0.18.0 (2021-08-05)

Added

Implement BandPassFilter, HighPassFilter, LowPassFilter and Reverse. Thanks to atamazian.

v0.17.0 (2021-06-25)

Added

Add a fade option in Shift for eliminating unwanted clicks
Add support for 32-bit int wav loading with scipy>=1.6
Add support for float64 wav files. However, the use of this format is discouraged, since float32 is more than enough for audio in most cases.
Implement Clip. Thanks to atamazian.
Add some parameter sanity checks in AddGaussianNoise
Officially support librosa 0.8.1

Changed

Rename AddImpulseResponse to ApplyImpulseResponse. The former will still work for now, but give a warning.
When looking for audio files in AddImpulseResponse, AddBackgroundNoise and AddShortNoises, follow symlinks by default.
When using the new parameters min_snr_in_db and max_snr_in_db in AddGaussianSNR, SNRs will be picked uniformly in the decibel scale instead of in the linear amplitude ratio scale. The new behavior aligns more with human hearing, which is not linear.

Fixed

Avoid division by zero in AddImpulseResponse when input is digital silence (all zeros)
Fix inverse SNR characteristics in AddGaussianSNR. It will continue working as before unless you switch to the new parameters min_snr_in_db and max_snr_in_db. If you use the old parameters, you'll get a warning.

v0.16.0 (2021-02-11)

Added

Implement SpecCompose for applying a pipeline of spectrogram transforms. Thanks to omerferhatt.

Fixed

Fix a bug in SpecChannelShuffle where it did not support more than 3 audio channels. Thanks to omerferhatt.
Limit scipy version range to >=1.0,<1.6 to avoid issues with loading 24-bit wav files. Support for scipy>=1.6 will be added later.

v0.15.0 (2020-12-10)

Added

Add an option leave_length_unchanged to AddImpulseResponse

Fixed

Fix picklability of instances of AddImpulseResponse, AddBackgroundNoise and AddShortNoises

v0.14.0 (2020-12-06)

Added

Implement LoudnessNormalization
Implement randomize_parameters in Compose. Thanks to SolomidHero.
Add multichannel support to AddGaussianNoise, AddGaussianSNR, ClippingDistortion, FrequencyMask, PitchShift, Shift, TimeMask and TimeStretch

v0.13.0 (2020-11-10)

Added

Lay the foundation for spectrogram transforms. Implement SpecChannelShuffle and SpecFrequencyMask.
Configurable LRU cache for transforms that use external sound files. Thanks to alumae.
Officially add multichannel support to Normalize

Changed

Show a warning if a waveform had to be resampled after loading it. This is because resampling is slow. Ideally, files on disk should already have the desired sample rate.

Fixed

Correctly find audio files with upper case filename extensions.
Fix a bug where AddBackgroundNoise crashed when trying to add digital silence to an input. Thanks to juheeuu.

v0.12.1 (2020-09-28)

Changed

Speed up AddBackgroundNoise, AddShortNoises and AddImpulseResponse by loading wav files with scipy or wavio instead of librosa.

v0.12.0 (2020-09-23)

Added

Implement Mp3Compression
Officially support multichannel audio in Gain and PolarityInversion
Add m4a and opus to the list of recognized audio filename extensions

Changed

Expand range of supported librosa versions

Removed

Python <= 3.5 is no longer officially supported, since Python 3.5 has reached end-of-life
Breaking change: Internal util functions are no longer exposed directly. If you were doing e.g. from audiomentations import calculate_rms, now you have to do from audiomentations.core.utils import calculate_rms

v0.11.0 (2020-08-27)

Added

Implement Gain and PolarityInversion. Thanks to Spijkervet for the inspiration.

v0.10.1 (2020-07-27)

Changed

Improve the performance of AddBackgroundNoise and AddShortNoises by optimizing the implementation of calculate_rms.

Fixed

Improve compatibility of output files written by the demo script. Thanks to xwJohn.
Fix division by zero bug in Normalize. Thanks to ZFTurbo.

v0.10.0 (2020-05-05)

Added

AddImpulseResponse, AddBackgroundNoise and AddShortNoises now support aiff files in addition to flac, mp3, ogg and wav

Changed

Breaking change: AddImpulseResponse, AddBackgroundNoise and AddShortNoises now include subfolders when searching for files. This is useful when your sound files are organized in subfolders.

Fixed

Fix filter instability bug in FrequencyMask. Thanks to kvilouras.

v0.9.0 (2020-02-20)

Added

Remember randomized/chosen effect parameters. This allows for freezing the parameters and applying the same effect to multiple sounds. Use transform.freeze_parameters() and transform.unfreeze_parameters() for this.
Implement transform.serialize_parameters(). Useful for when you want to store metadata on how a sound was perturbed.
Add a rollover parameter to Shift. This allows for introducing silence instead of a wrapped part of the sound.
Add support for flac in AddImpulseResponse
Implement AddBackgroundNoise transform. Useful for when you want to add background noise to all of your sound. You need to give it a folder of background noises to choose from.
Implement AddShortNoises. Useful for when you want to add (bursts of) short noise sounds to your input audio.

Changed

Disregard non-audio files when looking for impulse response files
Switch to a faster convolve implementation. This makes AddImpulseResponse significantly faster.
Expand supported range of librosa versions

Fixed

Fix a bug in ClippingDistortion where the min_percentile_threshold was not respected as expected.
Improve handling of empty input

v0.8.0 (2020-01-28)

Added

Add shuffle parameter in Composer
Add Resample transformation
Add ClippingDistortion transformation
Add fade parameter to TimeMask

Thanks to askskro

v0.7.0 (2020-01-14)

Added

AddGaussianSNR
AddImpulseResponse
FrequencyMask
TimeMask
Trim

Thanks to karpnv

v0.6.0 (2019-05-27)

Added

Implement peak normalization

v0.5.0 (2019-02-23)

Added

Implement Shift transform

Changed

Ensure p is within bounds

v0.4.0 (2019-02-19)

Added

Implement PitchShift transform

Fixed

Fix output dtype of AddGaussianNoise

v0.3.0 (2019-02-19)

Added

Implement leave_length_unchanged in TimeStretch

v0.2.0 (2019-02-18)

Added

Add TimeStretch transform
Parametrize AddGaussianNoise

v0.1.0 (2019-02-15)

Added

Initial release. Includes only one transform: AddGaussianNoise

Development

Install the dependencies specified in requirements.txt

Code style

Format the code with black

Run tests and measure code coverage

pytest

Generate demo sounds for empirical evaluation

python -m demo.demo

Alternatives

Audiomentations isn't the only python library that can do various types of audio data augmentation/degradation! Here's an overview:

Name	Github stars	License	Last commit	GPU support?
audio-degradation-toolbox
audio_degrader
audiomentations
AugLy
kapre
muda
nlpaug
pedalboard
pydiogment
python-audio-effects
sigment
SpecAugment
spec_augment
teal
torch-audiomentations
torchaudio-augmentations
WavAugment

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.43.1

Sep 13, 2025

0.43.0

Sep 9, 2025

0.42.0

Jul 4, 2025

0.41.0

May 5, 2025

0.40.0

Mar 20, 2025

0.39.0

Feb 12, 2025

0.38.0

Dec 6, 2024

0.37.0

Sep 3, 2024

0.36.1

Aug 20, 2024

0.36.0

Jun 10, 2024

0.35.0

Mar 15, 2024

0.34.1

Nov 24, 2023

0.33.0

Aug 30, 2023

0.32.0

Aug 15, 2023

0.31.0

Jun 21, 2023

0.30.0

May 2, 2023

0.29.0

Mar 15, 2023

0.28.0

Jan 12, 2023

0.27.0

Sep 13, 2022

This version

0.26.0

Aug 19, 2022

0.25.1

Jun 15, 2022

0.25.0

May 30, 2022

0.24.0

Mar 18, 2022

0.23.0

Mar 7, 2022

0.22.0

Feb 18, 2022

0.21.0

Feb 10, 2022

0.20.0

Nov 18, 2021

0.19.0

Oct 18, 2021

0.18.0

Aug 5, 2021

0.17.0

Jun 25, 2021

0.16.0

Feb 11, 2021

0.15.0

Dec 10, 2020

0.14.0

Dec 6, 2020

0.13.0

Nov 10, 2020

0.12.1

Sep 28, 2020

0.12.0

Sep 23, 2020

0.11.0

Aug 27, 2020

0.10.1

Jul 27, 2020

0.10.0

May 5, 2020

0.9.0

Feb 20, 2020

0.8.0

Jan 28, 2020

0.7.0

Jun 14, 2019

0.6.0

May 27, 2019

0.5.0

Feb 23, 2019

0.4.0

Feb 19, 2019

0.3.0

Feb 19, 2019

0.2.0

Feb 18, 2019

0.1.0

Feb 15, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiomentations-0.26.0.tar.gz (68.4 kB view details)

Uploaded Aug 19, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audiomentations-0.26.0-py3-none-any.whl (74.3 kB view details)

Uploaded Aug 19, 2022 Python 3

File details

Details for the file audiomentations-0.26.0.tar.gz.

File metadata

Download URL: audiomentations-0.26.0.tar.gz
Upload date: Aug 19, 2022
Size: 68.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.9

File hashes

Hashes for audiomentations-0.26.0.tar.gz
Algorithm	Hash digest
SHA256	`75978e3e11b5f3be24f20fc5e595239e93fd9738b59c7c9eae54b808c4ee8147`
MD5	`8ddbc4cf0703fe0f39c375d1e5dcaa43`
BLAKE2b-256	`4129c61357889dabc89004ed00c88b97dd4f9ee9c458c489f214c62d434990b5`

See more details on using hashes here.

File details

Details for the file audiomentations-0.26.0-py3-none-any.whl.

File metadata

Download URL: audiomentations-0.26.0-py3-none-any.whl
Upload date: Aug 19, 2022
Size: 74.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.9

File hashes

Hashes for audiomentations-0.26.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0104eb0c5ea382d3a22f46f856469ba5ef8b5defe63450ad04ab7244e61371b2`
MD5	`a38bc948b867f760fe2af782230dde4d`
BLAKE2b-256	`39376b671141ff2f0581f52291a1a6feb71ada8e8cd92ff9355ef8a0676ff459`

See more details on using hashes here.

audiomentations 0.26.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Audiomentations

Setup

Optional requirements

Usage example

Waveform

Spectrogram

Waveform transforms

AddBackgroundNoise

AddGaussianNoise

AddGaussianSNR

ApplyImpulseResponse

AddShortNoises

AirAbsorption

BandPassFilter

BandStopFilter

Clip

ClippingDistortion

FrequencyMask

Gain

GainTransition

HighPassFilter

HighShelfFilter

LowPassFilter

LowShelfFilter

Mp3Compression

Lambda

Limiter

LoudnessNormalization

Normalize

Padding

PeakingFilter

PitchShift

PolarityInversion

Resample

Reverse

RoomSimulator

SevenBandParametricEQ

Shift

TanhDistortion

TimeMask

TimeStretch

Trim

Spectrogram transforms

SpecChannelShuffle

SpecFrequencyMask

Composition classes

Compose

SpecCompose

OneOf

SomeOf

Known limitations

Multichannel audio

Changelog

Unreleased

v0.26.0 (2022-08-19)

Added

Fixed

v0.25.1 (2022-06-15)

Fixed

v0.25.0 (2022-05-30)

Added

Changed

Fixed

Removed

v0.24.0 (2022-03-18)

Added

Changed

Fixed

v0.23.0 (2022-03-07)

Added

Changed

`AddBackgroundNoise`

`AddGaussianNoise`

`AddGaussianSNR`

`ApplyImpulseResponse`

`AddShortNoises`

`AirAbsorption`

`BandPassFilter`

`BandStopFilter`

`Clip`

`ClippingDistortion`

`FrequencyMask`

`Gain`

`GainTransition`

`HighPassFilter`

`HighShelfFilter`

`LowPassFilter`

`LowShelfFilter`

`Mp3Compression`

`Lambda`

`Limiter`

`LoudnessNormalization`

`Normalize`

`Padding`

`PeakingFilter`

`PitchShift`

`PolarityInversion`

`Resample`

`Reverse`

`RoomSimulator`

`SevenBandParametricEQ`

`Shift`

`TanhDistortion`

`TimeMask`

`TimeStretch`

`Trim`

`SpecChannelShuffle`

`SpecFrequencyMask`

`Compose`

`SpecCompose`

`OneOf`

`SomeOf`