A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Project description
Audiomentations
A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.
Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!
Setup
pip install audiomentations
Usage example
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(min_fraction=-0.5, max_fraction=0.5, p=0.5),
])
# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)
# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)
Documentation
The API documentation, along with guides, example code, illustrations and example sounds, is available at https://iver56.github.io/audiomentations/
Transforms
- AddBackgroundNoise: Mixes in another sound to add background noise
- AddGaussianNoise: Adds gaussian noise to the audio samples
- AddGaussianSNR: Injects gaussian noise using a randomly chosen signal-to-noise ratio
- AddShortNoises: Mixes in various short noise sounds
- AdjustDuration: Trims or pads the audio to fit a target duration
- AirAbsorption: Applies frequency-dependent attenuation simulating air absorption
- ApplyImpulseResponse: Convolves the audio with a randomly chosen impulse response
- BandPassFilter: Applies band-pass filtering within randomized parameters
- BandStopFilter: Applies band-stop (notch) filtering within randomized parameters
- Clip: Clips audio samples to specified minimum and maximum values
- ClippingDistortion: Distorts the signal by clipping a random percentage of samples
- Gain: Multiplies the audio by a random gain factor
- GainTransition: Gradually changes the gain over a random time span
- HighPassFilter: Applies high-pass filtering within randomized parameters
- HighShelfFilter: Applies a high shelf filter with randomized parameters
- Lambda: Applies a user-defined transform
- Limiter: Applies dynamic range compression limiting the audio signal
- LoudnessNormalization: Applies gain to match a target loudness
- LowPassFilter: Applies low-pass filtering within randomized parameters
- LowShelfFilter: Applies a low shelf filter with randomized parameters
- Mp3Compression: Compresses the audio to lower the quality
- Normalize: Applies gain so that the highest signal level becomes 0 dBFS
- Padding: Replaces a random part of the beginning or end with padding
- PeakingFilter: Applies a peaking filter with randomized parameters
- PitchShift: Shifts the pitch up or down without changing the tempo
- PolarityInversion: Flips the audio samples upside down, reversing their polarity
- RepeatPart: Repeats a subsection of the audio a number of times
- Resample: Resamples the signal to a randomly chosen sampling rate
- Reverse: Reverses the audio along its time axis
- RoomSimulator: Simulates the effect of a room on an audio source
- SevenBandParametricEQ: Adjusts the volume of 7 frequency bands
- Shift: Shifts the samples forwards or backwards
- SpecChannelShuffle: Shuffles channels in the spectrogram
- SpecFrequencyMask: Applies a frequency mask to the spectrogram
- TanhDistortion: Applies tanh distortion to distort the signal
- TimeMask: Makes a random part of the audio silent
- TimeStretch: Changes the speed without changing the pitch
- Trim: Trims leading and trailing silence from the audio
Changelog
[0.32.0] - 2023-08-15
Added
- Add new
RepeatPart
transform
Changed
- Bump min version of numpy dependency from 1.13 to 1.16
- If a transform is in "frozen parameters" mode, but has no parameters yet, the first transform call will randomize/set parameters
- Increase the threshold for raising
WrongMultichannelAudioShape
. This allows some rare use cases where the number of channels slightly exceeds the number of samples.
Fixed
- Fix some type hints that were
np.array
instead ofnp.ndarray
For the full changelog, including older versions, see https://iver56.github.io/audiomentations/changelog/
Acknowledgements
Thanks to Nomono for backing audiomentations.
Thanks to all contributors who help improving audiomentations.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file audiomentations-0.32.0.tar.gz
.
File metadata
- Download URL: audiomentations-0.32.0.tar.gz
- Upload date:
- Size: 50.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08a3886b06cce091bde12713917d69e57401632afead4d1653f69d68d33392de |
|
MD5 | 085276e94b00807adb1fd4f91374520a |
|
BLAKE2b-256 | bdf304b44a2f66972be9d373f71bb8a16ea60c7022755d88635262421791133e |
File details
Details for the file audiomentations-0.32.0-py3-none-any.whl
.
File metadata
- Download URL: audiomentations-0.32.0-py3-none-any.whl
- Upload date:
- Size: 76.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5912308b4d9b1d71bb58c10593117a6af0895c2bfa7b2363a4dab45890f7a805 |
|
MD5 | ed49f458dc6c5677d07f5154b228a903 |
|
BLAKE2b-256 | 158a6953e64a2f703a173efa09e1411f711224a3651df2c309c9156c3767b46c |