Skip to main content

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Project description

Audiomentations

Build status Code coverage Code Style: Black Licence: MIT DOI

Audiomentations is a Python library for audio data augmentation, built to be fast and easy to use - its API is inspired by albumentations. It's useful for making audio deep learning models work well in the real world, not just in the lab. Audiomentations runs on CPU, supports mono audio and multichannel audio and integrates well in training pipelines, such as those built with TensorFlow/Keras or PyTorch. It has helped users achieve world-class results in Kaggle competitions and is trusted by companies building next-generation audio products with AI.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support PyPI version Number of downloads from PyPI per month os: Linux, macOS, Windows

pip install audiomentations

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Documentation

The API documentation, along with guides, example code, illustrations and example sounds, is available at https://iver56.github.io/audiomentations/

Transforms

  • AddBackgroundNoise: Mixes in another sound to add background noise
  • AddColorNoise: Adds noise with specific color
  • AddGaussianNoise: Adds gaussian noise to the audio samples
  • AddGaussianSNR: Injects gaussian noise using a randomly chosen signal-to-noise ratio
  • AddShortNoises: Mixes in various short noise sounds
  • AdjustDuration: Trims or pads the audio to fit a target duration
  • AirAbsorption: Applies frequency-dependent attenuation simulating air absorption
  • Aliasing: Produces aliasing artifacts by downsampling without low-pass filtering and then upsampling
  • ApplyImpulseResponse: Convolves the audio with a randomly chosen impulse response
  • BandPassFilter: Applies band-pass filtering within randomized parameters
  • BandStopFilter: Applies band-stop (notch) filtering within randomized parameters
  • BitCrush: Applies bit reduction without dithering
  • Clip: Clips audio samples to specified minimum and maximum values
  • ClippingDistortion: Distorts the signal by clipping a random percentage of samples
  • Gain: Multiplies the audio by a random gain factor
  • GainTransition: Gradually changes the gain over a random time span
  • HighPassFilter: Applies high-pass filtering within randomized parameters
  • HighShelfFilter: Applies a high shelf filter with randomized parameters
  • Lambda: Applies a user-defined transform
  • Limiter: Applies dynamic range compression limiting the audio signal
  • LoudnessNormalization: Applies gain to match a target loudness
  • LowPassFilter: Applies low-pass filtering within randomized parameters
  • LowShelfFilter: Applies a low shelf filter with randomized parameters
  • Mp3Compression: Compresses the audio to lower the quality
  • Normalize: Applies gain so that the highest signal level becomes 0 dBFS
  • Padding: Replaces a random part of the beginning or end with padding
  • PeakingFilter: Applies a peaking filter with randomized parameters
  • PitchShift: Shifts the pitch up or down without changing the tempo
  • PolarityInversion: Flips the audio samples upside down, reversing their polarity
  • RepeatPart: Repeats a subsection of the audio a number of times
  • Resample: Resamples the signal to a randomly chosen sampling rate
  • Reverse: Reverses the audio along its time axis
  • RoomSimulator: Simulates the effect of a room on an audio source
  • SevenBandParametricEQ: Adjusts the volume of 7 frequency bands
  • Shift: Shifts the samples forwards or backwards
  • TanhDistortion: Applies tanh distortion to distort the signal
  • TimeMask: Makes a random part of the audio silent
  • TimeStretch: Changes the speed without changing the pitch
  • Trim: Trims leading and trailing silence from the audio

Changelog

[0.41.0] - 2025-05-05

Added

  • Add support for NumPy 2.x
  • Add weights parameter to OneOf. This lets you guide the probability of each transform being chosen.

Changed

  • Improve type hints

:warning: The TimeMask transform has been changed significantly:

  • Breaking change: Remove fade parameter. fade_duration=0.0 now denotes disabled fading.
  • Enable fading by default
  • Apply a smooth fade curve instead of a linear one
  • Add mask_location parameter
  • Change the default value of min_band_part from 0.0 to 0.01
  • Change the default value of max_band_part from 0.5 to 0.2
  • ~50% faster

The following examples show how you can adapt your code when upgrading from <=v0.40.0 to >=v0.41.0:

<= 0.40.0 >= 0.41.0
TimeMask(min_band_part=0.1, max_band_part=0.15, fade=True) TimeMask(min_band_part=0.1, max_band_part=0.15, fade_duration=0.01)
TimeMask() TimeMask(min_band_part=0.0, max_band_part=0.5, fade_duration=0.0)

Removed

  • SpecCompose, SpecChannelShuffle and SpecFrequencyMask have been removed. You can read more about this here: #391

For the full changelog, including older versions, see https://iver56.github.io/audiomentations/changelog/

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiomentations-0.41.0.tar.gz (83.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiomentations-0.41.0-py3-none-any.whl (85.9 kB view details)

Uploaded Python 3

File details

Details for the file audiomentations-0.41.0.tar.gz.

File metadata

  • Download URL: audiomentations-0.41.0.tar.gz
  • Upload date:
  • Size: 83.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for audiomentations-0.41.0.tar.gz
Algorithm Hash digest
SHA256 e83e2de91393e2fdc80e4713f01f3eb5f085c55fbe77e60c9e9e3c35d7930aa7
MD5 98d8932ad6a3b242300da19b98929b13
BLAKE2b-256 061a89d90284278f540825d40b727661aaf7ae0e6e8ba04a8f6200fba8b9b51b

See more details on using hashes here.

File details

Details for the file audiomentations-0.41.0-py3-none-any.whl.

File metadata

File hashes

Hashes for audiomentations-0.41.0-py3-none-any.whl
Algorithm Hash digest
SHA256 020327f0baea38629b3b4dddabcbb8e1ffbe4611a7223e0946015d7b74703a76
MD5 952397f7ab6718292cede7e4b58468c9
BLAKE2b-256 6be56bafb39cb896ac69b5681cfb5e926f7e191e54a82f5414b994c3c50f01ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page