A general speech augmentation policy
Project description
🎛️ Wav2Aug: Toward Universal Time-Domain Speech Augmentation
A minimalistic PyTorch-based audio augmentation library for speech and audio augmentation. The goal of this library is to provide a general purpose speech augmentation policy that can be used on any task and perform well without having to tune augmentation hyperparameters. Just install, and start augmenting. Applies two random augmentations per call.
⚙️ Features
- Minimal dependencies: we only rely on PyTorch, torchcodec, and torchaudio.
- 9 core augmentations: amplitude scaling/clipping, noise addition, frequency dropout, polarity inversion, chunk swapping, speed perturbation, time dropout, and babble noise.
- Simplicity: just install and start augmenting!
- Randomness: all stochastic ops use PyTorch RNGs. Set a single seed and be done, e.g. torch.manual_seed(0); torch.cuda.manual_seed_all(0)
📦 Installation
pip
pip install wav2aug
uv
uv add wav2aug
🚀 Quick Start
import torch
from wav2aug.gpu import Wav2Aug
# Initialize the augmenter once
augmenter = Wav2Aug(sample_rate=16000)
# in the forward pass
wavs = torch.randn(3, 50000)
lens = torch.ones((wavs.size(0)))
aug_wavs, aug_lens = augmenter(wavs, lens)
That's it!
🧪 Augmentation Types
- 🔊 Amplitude Scaling/Clipping: Random gain and peak limiting
- 🌫️ Noise Addition: Environmental noise with SNR control
- 📶 Frequency Dropout: Spectral masking with random notch filters
- 🔄 Polarity Inversion: Random phase flip
- 🧩 Chunk Swapping: Temporal segment reordering
- ⏱️ Speed Perturbation: Time-scale modification
- 🕳️ Time Dropout: Random silence insertion
- 👥 Babble Noise: Multi-speaker background (auto-enabled with sufficient buffer)
🛠️ Development Installation
uv
git clone https://github.com/gfdb/wav2aug
cd wav2aug
# create venv and pin Python
uv venv
source .venv/bin/activate
uv python pin 3.10 # or 3.11/3.12
# runtime only
uv sync
# extras
uv sync --extra dev
uv sync --extra test
pip
git clone https://github.com/gfdb/wav2aug
cd wav2aug
# create venv
python -m venv .venv
source .venv/bin/activate
# runtime only
python -m pip install .
# editable + extras for development
python -m pip install -e '.[dev,test]'
✅ Tests
uv
uv run pytest -q tests/
pip
pytest -q tests/
🤝 Contributing
-
Issues and PRs are welcome and encouraged!
-
Bug reports: please open an issue with a minimal repro (env, torch/torchaudio/torchcodec versions, code snippet, expected vs. actual, traceback).
-
Feature requests: please open an issue with use-case and proposed feature.
-
PRs: keep them focused. Add tests when behavior changes. Don't forget to run formatters and tests before submitting!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wav2aug-0.0.1.tar.gz.
File metadata
- Download URL: wav2aug-0.0.1.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8ab3bee5f2c19e16d9509bbea84cf1013bbe4087e57903131768384a37c8d29
|
|
| MD5 |
26cf32f89fa0e843f56b9ba3f512ad39
|
|
| BLAKE2b-256 |
f812412edce2c25c028d49253c4cf24e419aa53dc6b734f06c6c3568b14ab837
|
Provenance
The following attestation bundles were made for wav2aug-0.0.1.tar.gz:
Publisher:
publish.yml on gfdb/wav2aug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wav2aug-0.0.1.tar.gz -
Subject digest:
a8ab3bee5f2c19e16d9509bbea84cf1013bbe4087e57903131768384a37c8d29 - Sigstore transparency entry: 659630415
- Sigstore integration time:
-
Permalink:
gfdb/wav2aug@9eb8871bc91d03f49eb8213f8e95e39c9bc9de1b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gfdb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9eb8871bc91d03f49eb8213f8e95e39c9bc9de1b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file wav2aug-0.0.1-py3-none-any.whl.
File metadata
- Download URL: wav2aug-0.0.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
083ef75919b64ddba98a70f53eaa0d199cb57bc1faf1af90f1c5a60d586b7e48
|
|
| MD5 |
0350e4b1dd5e67455f4b94fd5d6942ad
|
|
| BLAKE2b-256 |
b25db65b5658da9bdce32c4a59ddbb30d2fc97fa2fdf3ad05da7104d965ae33d
|
Provenance
The following attestation bundles were made for wav2aug-0.0.1-py3-none-any.whl:
Publisher:
publish.yml on gfdb/wav2aug
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
wav2aug-0.0.1-py3-none-any.whl -
Subject digest:
083ef75919b64ddba98a70f53eaa0d199cb57bc1faf1af90f1c5a60d586b7e48 - Sigstore transparency entry: 659630424
- Sigstore integration time:
-
Permalink:
gfdb/wav2aug@9eb8871bc91d03f49eb8213f8e95e39c9bc9de1b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gfdb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9eb8871bc91d03f49eb8213f8e95e39c9bc9de1b -
Trigger Event:
workflow_dispatch
-
Statement type: