Skip to main content

A general speech augmentation policy

Project description

🎛️ Wav2Aug: Toward Universal Time-Domain Speech Augmentation

A minimalistic PyTorch-based audio augmentation library for speech and audio augmentation. The goal of this library is to provide a general purpose speech augmentation policy that can be used on any task and perform well without having to tune augmentation hyperparameters. Just install, and start augmenting. Applies two random augmentations per call.

Diagram

⚙️ Features

  • Minimal dependencies: we only rely on PyTorch, torchcodec, and torchaudio.
  • 9 core augmentations: amplitude scaling/clipping, noise addition, frequency dropout, polarity inversion, chunk swapping, speed perturbation, time dropout, and babble noise.
  • Simplicity: just install and start augmenting!
  • Randomness: all stochastic ops use PyTorch RNGs. Set a single seed and be done, e.g. torch.manual_seed(0); torch.cuda.manual_seed_all(0)

📦 Installation

pip

pip install wav2aug

uv

uv add wav2aug

🚀 Quick Start

import torch
from wav2aug.gpu import Wav2Aug

# Initialize the augmenter once
augmenter = Wav2Aug(sample_rate=16000)

# in the forward pass
wavs = torch.randn(3, 50000)
lens = torch.ones((wavs.size(0)))

aug_wavs, aug_lens = augmenter(wavs, lens)

That's it!

🧪 Augmentation Types

  • 🔊 Amplitude Scaling/Clipping: Random gain and peak limiting
  • 🌫️ Noise Addition: Environmental noise with SNR control
  • 📶 Frequency Dropout: Spectral masking with random notch filters
  • 🔄 Polarity Inversion: Random phase flip
  • 🧩 Chunk Swapping: Temporal segment reordering
  • ⏱️ Speed Perturbation: Time-scale modification
  • 🕳️ Time Dropout: Random silence insertion
  • 👥 Babble Noise: Multi-speaker background (auto-enabled with sufficient buffer)

🛠️ Development Installation

uv

git clone https://github.com/gfdb/wav2aug
cd wav2aug

# create venv and pin Python
uv venv
source .venv/bin/activate
uv python pin 3.10  # or 3.11/3.12

# runtime only
uv sync

# extras
uv sync --extra dev
uv sync --extra test

pip

git clone https://github.com/gfdb/wav2aug
cd wav2aug

# create venv
python -m venv .venv
source .venv/bin/activate

# runtime only
python -m pip install .

# editable + extras for development
python -m pip install -e '.[dev,test]'

✅ Tests

uv

uv run pytest -q tests/

pip

pytest -q tests/

🤝 Contributing

  • Issues and PRs are welcome and encouraged!

  • Bug reports: please open an issue with a minimal repro (env, torch/torchaudio/torchcodec versions, code snippet, expected vs. actual, traceback).

  • Feature requests: please open an issue with use-case and proposed feature.

  • PRs: keep them focused. Add tests when behavior changes. Don't forget to run formatters and tests before submitting!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wav2aug-0.0.2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wav2aug-0.0.2-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file wav2aug-0.0.2.tar.gz.

File metadata

  • Download URL: wav2aug-0.0.2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wav2aug-0.0.2.tar.gz
Algorithm Hash digest
SHA256 90cae96ba09c335cfffb611603858a3cb252e4e8673c15e7428684e0b038e816
MD5 894456ed254d3ad4ec0c4cfd0b9ba6b3
BLAKE2b-256 5c429e25a1fed6daa8d2905db9697ba663ef98225bafdbc9ebd090218648381c

See more details on using hashes here.

Provenance

The following attestation bundles were made for wav2aug-0.0.2.tar.gz:

Publisher: publish.yml on gfdb/wav2aug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wav2aug-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: wav2aug-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wav2aug-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3019c0b2b91a95549ad72277beb5c7fffcf695e3ce73aee7cb9e7bf105e8ea98
MD5 5633d92096d25bc0ae6e0d8a7dab8b40
BLAKE2b-256 1e4ad1373491b8bfdaa637c5ac66e2598a200e6989224bc6dea6d89d714b25a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for wav2aug-0.0.2-py3-none-any.whl:

Publisher: publish.yml on gfdb/wav2aug

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page