Python package for simple application of wide range of audio augmentations.
Project description
AudioAugmentor
Python library for augmenting audio data
This library is designed to augment audio data for machine learning purposes. It combines several tools and libraries for audio data augmentation and provides a unified interface that can be used to apply a large set of audio augmentations in one place.
The library is designed to be used with the PyTorch machine learning framework. It can also work solely on just simple audio waveforms and augment just those. Additionally it can also augment local audio datasets.
This library specifically combines these libraries and tools:
Available augmentations
Table below shows which library was used to apply specific audio augmentation/codec.
Usage
For a more complex example see example colab notebook above.
Or see jupyter notebook AudioAugmentor_Usage_Example.ipynb
in the examples
directory within the public repositry.
Note: AudioAugmentor was mainly tested using Python 3.11.8 and Fedora 38 (Google Colab uses Python 3.10 and Ubuntu)
0. You need to install the library and necessary packages first
!!!You may need to run the following commands with sudo!!!
If so install these packages manually in terminal.
pip install -U pip
pip install AudioAugmentor
dnf install -y sox # FEDORA
dnf install -y sox-devel # FEDORA
dnf install -y ffmpeg # FEDORA
# apt-get install -y sox # UBUNTU
# apt-get install -y libsox-dev # UBUNTU
# apt-get install -y ffmpeg # UBUNTU
1. Import necessary libraries
import torch
import torchaudio
import numpy as np
import audiomentations as AA
from IPython.display import Audio, display
from AudioAugmentor import transf_gen
from AudioAugmentor import sox_parser
from AudioAugmentor import core
from AudioAugmentor import rir_setup
from AudioAugmentor import torchaudio_transf_wrapper as TTW
2. Define the augmentations you want to apply to your audio data.
You have 3 options of how to define the augmentations:
a) Use transf_gen.transf_gen
function to generate list of transformations.
See supported transformation table and examples of every augmentation, so you know what parameters are needed for each augmentation method.
You can enter augmentation parameters as a string or as a dictionary.
PitchShift='sample_rate=16000, n_steps=[1, 1.5, 0.1], p=1.0'
PitchShift={'sample_rate': 16000, 'n_steps': [1, 1.5, 0.1], 'p': 1.0}
transformations = transf_gen.transf_gen(verbose=True,
PitchShift='sample_rate=16000, n_steps=[1, 1.5, 0.1], p=1.0',
Speed={'orig_freq': 16000, 'factor': [0.9, 1.5, 0.1], 'p': 1},
LowPassFilter={'min_cutoff_freq': 700, 'max_cutoff_freq': 800, 'sample_rate': sampling_rate, 'p': 1},
)
b) Use pseudo SoX command. SoX command must be in this format:
--sox="norm gain 0 highpass 1000 phaser 0.5 0.6 1 0.45 0.6 -s"
(When you don't want to apply some codec after applying SoX effects)
OR
--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k
(In this case, you want to apply codec after applying SoX effects -> Codec is entered in the form codec_name
codec_parameter_name
codec_parameter_value
directly after the SoX effects command)
example_sox = '--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k'
c) Use a file with multiple pseudo SoX commands. Random SoX command from this file will be chosen and applied to your data.
File must to be loaded using sox_parser.load_sox_file
function.
sox_file_content_to_write = '''--sox="norm gain 0 highpass 1000 phaser 0.5 0.6 1 0.45 0.6 -s"
#--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s"
--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" gsm
--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k
'''
with open('sox_file_example.txt', 'w') as f:
f.write(sox_file_content_to_write)
sox_file_content = sox_parser.load_sox_file('sox_file_example.txt')
print('SOX FILE LOADED:', sox_file_content, type(sox_file_content))
3. Apply augmentations
a) Use generated the transformations
list, single SoX command
or loaded SoX file content
while initializing Collator
class.
Use this initiated class as an argument for the collate_fn
parameter of PyTorch's dataloader.
collate_fn = core.Collator(
transformations=transformations, device='cpu', sox_effects=None, sample_rate=sampling_rate, verbose=True,
#transformations=None, device='cpu', sox_effects='--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k', sample_rate=sampling_rate, verbose=False,
#transformations=None, device='cpu', sox_effects=sox_file_content, sample_rate=sampling_rate, verbose=False,
)
dataset = torchaudio.datasets.LIBRISPEECH("../data", url="train-clean-100", download=True)
aug_dataloader = torch.utils.data.DataLoader(
dataset,
batch_size=1,
num_workers=0,
collate_fn=collate_fn,
)
augmented_record_from_dataset = next(iter(aug_dataloader))
display(Audio(augmented_record_from_dataset[0].squeeze(0).squeeze(0).squeeze(0).cpu(), rate=sampling_rate))
OR
b) Use generated the transformations
list, single SoX command
or loaded SoX file content
while initializing AugmentWaveform
class and apply the augmentations to the audio signal.
augment = core.AugmentWaveform(
transformations=transformations, device='cpu', sox_effects=None, sample_rate=16000, verbose=False,
#transformations=None, device='cpu', sox_effects='--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k', sample_rate=16000, verbose=False,
#transformations=None, device='cpu', sox_effects=sox_file_content, sample_rate=16000, verbose=False,
)
# Load test wav file
signal, fs = torchaudio.load('../data/test.wav')
# Apply transformations
waveform = augment(signal.numpy()[0])
display(Audio(waveform, rate=fs))
c) Use generated the transformations
list, single SoX command
or loaded SoX file content
while initializing AugmentLocalAudioDataset
class and apply the augmentations to the local audio dataset.
augment = core.AugmentLocalAudioDataset(
transformations=transformations, device='cpu', sox_effects=None, sample_rate=16000, verbose=False,
#transformations=None, device='cpu', sox_effects='--sox="norm gain 20 highpass 300 phaser 0.5 0.6 1 0.45 0.6 -s" amr audio_bitrate 4.75k', sample_rate=16000, verbose=False,
#transformations=None, device='cpu', sox_effects=sox_file_content, sample_rate=16000, verbose=False,
)
augment(input_dir='../data/test-input-folder', output_dir='../data/test-output-folder')
EXAMPLES OF AVAILABLE AUGMENTATIONS
!!!Put following examples as an argument for transf_gen.transf_gen
function to generate a list of transformations!!!
Like this:
transformations = transf_gen.transf_gen(verbose=True,
AddBackgroundNoise=f'background_paths="../data/musan/noise/free-sound", min_snr_in_db=10, max_snr_in_db=20, p=1, sample_rate={sampling_rate}',
AddColoredNoise=f'min_snr_in_db=9, max_snr_in_db=10, p=1, sample_rate={sampling_rate}',
)
You can enter augmentation parameters as a string or as a dictionary.
PitchShift='sample_rate=16000, n_steps=[1, 1.5, 0.1], p=1.0'
PitchShift={'sample_rate': 16000, 'n_steps': [1, 1.5, 0.1], 'p': 1.0}
⬆️ AddBackgroundNoise docs
AddBackgroundNoise=f'''background_paths="../data/musan/noise/free-sound",
min_snr_in_db=10,
max_snr_in_db=20,
p=1,
sample_rate={sampling_rate}''',
⬆️ AddColoredNoise docs
AddColoredNoise=f'''min_snr_in_db=9,
max_snr_in_db=10,
p=1,
sample_rate={sampling_rate}''',
⬆️ AddGaussianNoise docs
AddGaussianNoise={'min_amplitude': 0.001,
'max_amplitude': 0.015,
'p': 1},
⬆️ AddShortNoises docs
AddShortNoises={'sounds_path': "../data/musan/noise/free-sound",
'min_snr_in_db': 3.0,
'max_snr_in_db': 30.0,
'noise_rms': "relative_to_whole_input",
'min_time_between_sounds': 2.0,
'max_time_between_sounds': 8.0,
'noise_transform': AA.PolarityInversion(),
'p': 1.0},
⬆️ AdjustDuration docs
AdjustDuration={'duration_seconds': 3.5,
'padding_mode': 'silence',
'p': 1},
⬆️ AirAbsorption docs
AirAbsorption={'min_distance': 10.0,
'max_distance': 50.0,
'min_humidity': 80.0,
'max_humidity': 90.0,
'min_temperature': 10.0,
'max_temperature': 20.0,
'p': 1.0},
⬆️ ApplyImpulseResponse docs
ApplyImpulseResponse=f'''ir_paths="../data/Rir.wav",
p=1,
sample_rate={sampling_rate}''',
⬆️ BandPassFilter docs
BandPassFilter=f'''min_center_frequency=200,
max_center_frequency=4000,
min_bandwidth_fraction=0.5,
max_bandwidth_fraction=1.99,
sample_rate={sampling_rate},
p=1''',
⬆️ BandStopFilter docs
BandStopFilter=f'''min_center_frequency=200,
max_center_frequency=4000,
min_bandwidth_fraction=0.5,
max_bandwidth_fraction=1.99,
sample_rate={sampling_rate},
p=1''',
⬆️ ClippingDistortion docs
ClippingDistortion={'min_percentile_threshold': 10,
'max_percentile_threshold': 30,
'p': 1},
⬆️ FrequencyMasking docs
FrequencyMasking={'freq_mask_param': 80},
⬆️ Volume / Gain docs
Vol={'gain': [2.5, 3, 0.1],
'p': 1.0},
⬆️ GainTransition docs
GainTransition={'min_gain_db': 30,
'max_gain_db': 40,
'min_duration': 5,
'max_duration': 16,
'duration_unit': 'seconds',
'p': 1},
⬆️ HighPassFilter docs
HighPassFilter=f'''min_cutoff_freq=700,
max_cutoff_freq=800,
sample_rate={sampling_rate},
p=1''',
⬆️ HighShelfFilter docs
HighShelfFilter={'min_center_freq': 2000,
'max_center_freq': 5000,
'min_gain_db': 10.0,
'max_gain_db': 16.0,
'min_q': 0.5,
'max_q': 1.0,
'p': 1},
⬆️ Limiter docs
Limiter='''min_threshold_db=-24,
max_threshold_db=-2,
min_attack=0.0005,
max_attack=0.025,
min_release=0.05,
max_release=0.7,
threshold_mode="relative_to_signal_peak",
p=1''',
⬆️ LoudnessNormalization docs
LoudnessNormalization={'min_lufs': -31,
'max_lufs': -13,
'p': 1},
⬆️ LowPassFilter docs
LowPassFilter={'min_cutoff_freq': 700,
'max_cutoff_freq': 800,
'sample_rate': sampling_rate,
'p': 1},
⬆️ LowShelfFilter docs
LowShelfFilter={'min_center_freq': 20,
'max_center_freq': 600,
'min_gain_db': -16.0,
'max_gain_db': 16.0,
'min_q': 0.5,
'max_q': 1.0,
'p': 1},
⬆️ Mp3Compression docs
Mp3Compression={'min_bitrate': 8,
'max_bitrate': 8,
'backend': 'pydub',
'p': 1},
⬆️ MelSpectrogram docs
MelSpectrogram={'sample_rate': 16000},
⬆️ Normalize docs
Normalize={'p': 1},
⬆️ Padding docs
Padding={'mode': 'silence',
'min_fraction': 0.02,
'max_fraction': 0.8,
'pad_section': 'start',
'p': 1},
⬆️ PeakNormalization docs
PeakNormalization={'p': 1,
'sample_rate': sampling_rate},
⬆️ PeakingFilter docs
PeakingFilter={'min_center_freq': 51,
'max_center_freq': 7400,
'min_gain_db': -22,
'max_gain_db': 22,
'min_q': 0.5,
'max_q': 1.0,
'p': 1},
⬆️ PitchShift docs
PitchShift={'sample_rate': 16000,
'n_steps': [1, 1.5, 0.1],
'bins_per_octave': 12,
'n_fft': 512,
'win_length':512,
'hop_length': 512//4,
'p': 1.0},
⬆️ PolarityInversion docs
PolarityInversion={'p': 1,
'sample_rate': sampling_rate},
⬆️ Time inversion docs
TimeInversion={'p': 1,
'sample_rate': sampling_rate},
⬆️ ApplyRIR
# Use this to see available materials you can use as walls_mat, floor_mat and ceiling_mat argument
# from AudioAugmentor import rir_setup
# rir_setup.get_all_materials_info()
# This way you set up parameters when you want to generate random room parameter
rir_kwargs = {
'audio_sample_rate': 16000,
'x_range': (0, 100),
'y_range': (0, 100),
'num_vertices_range': (3, 6),
'mic_height': 1.5,
'source_height': 1.5,
'walls_mat': 'curtains_cotton_0.5',
'room_height': 2.0,
'max_order': 3,
'floor_mat': 'carpet_cotton',
'ceiling_mat': 'hard_surface',
'ray_tracing': True,
'air_absorption': True,
}
# This way you set up parameters when you want to generate specific room
rir_kwargs = {
'audio_sample_rate': 16000,
'corners_coord': [[0, 0], [0, 3], [5, 3], [5, 1], [3, 1], [3, 0]],
'walls_mat': 'curtains_cotton_0.5',
'room_height': 2.0,
'max_order': 3,
'floor_mat': 'carpet_cotton',
'ceiling_mat': 'hard_surface',
'ray_tracing': True,
'air_absorption': True,
'source_coord': [[1.0], [1.0], [0.5]],
'microphones_coord': [[3.5], [2.0], [0.5]],
}
transformations = transf_gen.transf_gen(verbose=True,
ApplyRIR=rir_kwargs,
)
⬆️ SevenBandParametricEQ docs
SevenBandParametricEQ={'min_gain_db': -10,
'max_gain_db': 10,
'p': 1},
⬆️ Shift docs
Shift={'min_shift': 1,
'max_shift': 2,
'p': 1,
'sample_rate': sampling_rate},
⬆️ Speed docs
Speed={'orig_freq': 16000,
'factor': [0.9, 1.5, 0.1],
'p': 1},
⬆️ Spectrogram docs
Spectrogram={'sample_rate': 16000},
⬆️ TanhDistortion docs
TanhDistortion={'min_distortion': 0.1,
'max_distortion': 0.8,
'p': 1},
⬆️ TimeMasking docs
TimeMasking={'time_mask_param': 80},
⬆️ TimeStretch docs
TimeStretch='''min_rate=0.9,
max_rate=1.1,
p=0.2,
leave_length_unchanged=False''',
⬆️ Codecs using torchaudio
You can select just one. No need to use them all. :)
transformations = transf_gen.transf_gen(verbose=True,
ac3=True,
adpcm_ima_wav=True,
adpcm_ms=True,
adpcm_yamaha=True,
eac3=True,
flac=True,
libmp3lame=True,
mp2=True,
pcm_alaw=True,
pcm_f32le=True,
pcm_mulaw=True,
pcm_s16le=True,
pcm_s24le=True,
pcm_s32le=True,
pcm_u8=True,
wmav1=True,
wmav2=True,
)
⬆️ g726
g726={'audio_bitrate': '40k'},
⬆️ gsm
gsm=True,
⬆️ amr
amr={'audio_bitrate': '4.75k'},
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file audioaugmentor-0.1.1.tar.gz
.
File metadata
- Download URL: audioaugmentor-0.1.1.tar.gz
- Upload date:
- Size: 35.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2591317c7dd5185c101f4bf3862e79545cb9e948a234b25488b3cdea8cde6d4e |
|
MD5 | 44dc0c80ce12d432d981e4099281cb67 |
|
BLAKE2b-256 | 03fb923dc7632c7f61387d6ac004f1cbde3fcfd8f2cca56487b4d0e45d139621 |
File details
Details for the file AudioAugmentor-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: AudioAugmentor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 098d67f26ca381747f8aa05d61aaf16c2146bad6c12fa0cf0fca84c02cecd9e2 |
|
MD5 | afcb0236959a08c077d41ae990cbfdbd |
|
BLAKE2b-256 | 1837b4b56d8124fc5c86c17296b9b021650f57d19972d7e7f4b8ec598e222113 |