Skip to main content

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Project description

Audiomentations

Build status Code coverage Code Style: Black Licence: MIT DOI

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

Python version support PyPI version Number of downloads from PyPI per month os: Linux, macOS, Windows

pip install audiomentations

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Documentation

The API documentation, along with guides, example code, illustrations and example sounds, is available at https://iver56.github.io/audiomentations/

Transforms

  • AddBackgroundNoise: Mixes in another sound to add background noise
  • AddColorNoise: Adds noise with specific color
  • AddGaussianNoise: Adds gaussian noise to the audio samples
  • AddGaussianSNR: Injects gaussian noise using a randomly chosen signal-to-noise ratio
  • AddShortNoises: Mixes in various short noise sounds
  • AdjustDuration: Trims or pads the audio to fit a target duration
  • AirAbsorption: Applies frequency-dependent attenuation simulating air absorption
  • Aliasing: Produces aliasing artifacts by downsampling without low-pass filtering and then upsampling
  • ApplyImpulseResponse: Convolves the audio with a randomly chosen impulse response
  • BandPassFilter: Applies band-pass filtering within randomized parameters
  • BandStopFilter: Applies band-stop (notch) filtering within randomized parameters
  • BitCrush: Applies bit reduction without dithering
  • Clip: Clips audio samples to specified minimum and maximum values
  • ClippingDistortion: Distorts the signal by clipping a random percentage of samples
  • Gain: Multiplies the audio by a random gain factor
  • GainTransition: Gradually changes the gain over a random time span
  • HighPassFilter: Applies high-pass filtering within randomized parameters
  • HighShelfFilter: Applies a high shelf filter with randomized parameters
  • Lambda: Applies a user-defined transform
  • Limiter: Applies dynamic range compression limiting the audio signal
  • LoudnessNormalization: Applies gain to match a target loudness
  • LowPassFilter: Applies low-pass filtering within randomized parameters
  • LowShelfFilter: Applies a low shelf filter with randomized parameters
  • Mp3Compression: Compresses the audio to lower the quality
  • Normalize: Applies gain so that the highest signal level becomes 0 dBFS
  • Padding: Replaces a random part of the beginning or end with padding
  • PeakingFilter: Applies a peaking filter with randomized parameters
  • PitchShift: Shifts the pitch up or down without changing the tempo
  • PolarityInversion: Flips the audio samples upside down, reversing their polarity
  • RepeatPart: Repeats a subsection of the audio a number of times
  • Resample: Resamples the signal to a randomly chosen sampling rate
  • Reverse: Reverses the audio along its time axis
  • RoomSimulator: Simulates the effect of a room on an audio source
  • SevenBandParametricEQ: Adjusts the volume of 7 frequency bands
  • Shift: Shifts the samples forwards or backwards
  • SpecChannelShuffle: Shuffles channels in the spectrogram
  • SpecFrequencyMask: Applies a frequency mask to the spectrogram
  • TanhDistortion: Applies tanh distortion to distort the signal
  • TimeMask: Makes a random part of the audio silent
  • TimeStretch: Changes the speed without changing the pitch
  • Trim: Trims leading and trailing silence from the audio

Changelog

[0.38.0] - 2024-12-06

Added

  • Add/improve parameter validation in AddGaussianSNR, GainTransition, LoudnessNormalization and AddShortNoises
  • Add/update type hints for consistency
  • Add human-readable string representation of audiomentations class instances

Changed

  • Improve documentation with respect to consistency, clarity and grammar
  • Adjust Python version compatibility range, so all patches of Python 3.12 are supported

Removed

  • Remove deprecated _in_db args in Gain, AddBackgroundNoises, AddGaussianSNR, GainTransition, LoudnessNormalization and AddShortNoises

Fixed

  • Fix a bug where AirAbsorption often chose the wrong humidity bucket
  • Fix wrong logic in validation check of relation between crossfade_duration and min_part_duration in RepeatPart
  • Fix default value of max_absolute_rms_db in AddBackgroundNoises. It was incorrectly set to -45.0, but is now -15.0. This bug was introduced in 0.31.0.
  • Fix various errors in the documentation of AddShortNoises and AirAbsorption
  • Fix a bug where AddShortNoises sometimes raised a ValueError because of an empty array. This bug was introduced in 0.36.1.

For the full changelog, including older versions, see https://iver56.github.io/audiomentations/changelog/

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiomentations-0.38.0.tar.gz (81.4 kB view details)

Uploaded Source

Built Distribution

audiomentations-0.38.0-py3-none-any.whl (82.6 kB view details)

Uploaded Python 3

File details

Details for the file audiomentations-0.38.0.tar.gz.

File metadata

  • Download URL: audiomentations-0.38.0.tar.gz
  • Upload date:
  • Size: 81.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for audiomentations-0.38.0.tar.gz
Algorithm Hash digest
SHA256 984cc9a8bda7695eb187ac7ff75b6b684ea61936812d3ddc96ebf225b3dbbd83
MD5 87639c209578114a310bb086b054b9ef
BLAKE2b-256 5dcbc777a2b931c4594ac33c6bce7005bb1327c625c2959b88931f14c266800f

See more details on using hashes here.

File details

Details for the file audiomentations-0.38.0-py3-none-any.whl.

File metadata

File hashes

Hashes for audiomentations-0.38.0-py3-none-any.whl
Algorithm Hash digest
SHA256 261114d9ffcdc4f9c1efc58afa2f81e8b34d7f6e3ad0ea9fbaea650db8bb5f1b
MD5 114bba56529e6a640c59da8ef8aeb0d2
BLAKE2b-256 49b25a2a720ceb9c3f81eea4d7f5e1c39697a21dd3e90436979ee7f32866471c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page