AutoWave

The simplest way to classify audio in Python — pretrained transformers in 3 lines of code.

These details have not been verified by PyPI

Project links

Project description

AutoWave - Automatic Audio Classification Library

The simplest way to classify audio in Python.

AutoWave v2 Python 3.10+ PyTorch 2.0+ HuggingFace

Powered by pretrained transformer models (AST, Wav2Vec2, HuBERT) via HuggingFace — fine-tune a state-of-the-art audio classifier on your own dataset in a few lines of code.

from autowave import AudioClassifier

# 1. Load and train
model = AudioClassifier()
model.fit("data/train/")

# 2. Predict
result = model.predict("test.wav")
print(result)  # {"label": "dog_bark", "confidence": 0.94}

# 3. Evaluate
metrics = model.evaluate("data/test/")
print(f"Accuracy: {metrics['accuracy']:.2%}")

# 4. Save & reload
model.save("my_model/")
loaded = AudioClassifier.load("my_model/")

Installation

pip install AutoWave

Requirements: Python ≥ 3.10, PyTorch ≥ 2.0

Quick Start

1. Prepare your dataset

Organize audio files into class subfolders:

data/
  train/
    dog/     bark1.wav  bark2.wav  ...
    cat/     meow1.wav  meow2.wav  ...
    bird/    chirp1.wav chirp2.wav ...
  test/
    dog/     ...
    cat/     ...

2. Train and predict

from autowave import AudioClassifier

model = AudioClassifier()
model.fit("data/train/")
model.predict("data/test/dog/bark_test.wav")
# → {"label": "dog", "confidence": 0.97}

3. Evaluate

results = model.evaluate("data/test/")
print(f"Accuracy: {results['accuracy']:.2%}")
print(results["report"])

4. Save and reload

model.save("my_model/")
loaded = AudioClassifier.load("my_model/")
loaded.predict("new_audio.wav")

Zero-Shot Classification (no training)

Classify audio against any text labels — no dataset or fine-tuning required:

from autowave import ZeroShotClassifier

clf = ZeroShotClassifier()
clf.predict("audio.wav", labels=["dog barking", "cat meowing", "rain", "music"])
# → [{"label": "dog barking", "confidence": 0.91}, ...]

Advanced Options

model = AudioClassifier(
    model_name="ast",          # "ast" | "wav2vec2" | "hubert" | "wavlm" | any HF model ID
    epochs=10,
    batch_size=8,
    learning_rate=1e-4,
    augment=True,              # noise, pitch shift, time stretch, shift
    device="auto",             # "auto" | "cuda" | "mps" | "cpu"
    output_dir="checkpoints/",
    max_duration_s=10.0,
)
model.fit("data/train/", val_folder="data/val/")

Available models

Short name	HuggingFace model	Best for
`ast` (default)	MIT/ast-finetuned-audioset-10-10-0.4593	All audio types
`wav2vec2`	facebook/wav2vec2-base	Speech tasks
`hubert`	facebook/hubert-base-ls960	Speech tasks
`wavlm`	microsoft/wavlm-base	Speech benchmarks

Any HuggingFace AutoModelForAudioClassification-compatible model ID also works.

Export to ONNX

model.export_onnx("model.onnx")

Visualization

from autowave.visualization import plots

plots.waveform("audio.wav")
plots.spectrogram("audio.wav")
plots.mfcc("audio.wav")
plots.spectral_centroid("audio.wav")
plots.time_freq_overview("audio.wav")

Audio Utilities

from autowave.utils.audio import read_properties, resample, convert_format

# Metadata
props = read_properties("audio.wav")
print(props.sample_rate, props.duration_s, props.channels)

# Resample to 16 kHz
resample("audio.mp3", target_sr=16000, output_path="audio_16k.wav")

# Convert format
convert_format("audio.wav", output_format="mp3")

Supported Audio Formats

.wav · .mp3 · .flac · .ogg · .m4a · .aiff

Core Contributors

Nilesh Verma

Satyajit Pattnaik

Kalash Jindal

Citation

If you use AutoWave in your research or project, please cite:

@software{autowave2024,
  author       = {Verma, Nilesh and Pattnaik, Satyajit and Jindal, Kalash},
  title        = {{AutoWave}: Automatic Audio Classification with Pretrained Transformers},
  year         = {2024},
  version      = {2.0.0},
  url          = {https://github.com/TechyNilesh/Autowave},
  note         = {Python library for audio classification using AST, Wav2Vec2, HuBERT, and WavLM}
}

Developed with Love ❤️

Developed for ML researchers, data scientists, Python developers, speech engineers, and the open-source audio community.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Mar 31, 2026

0.5

Sep 3, 2021

0.3

Sep 3, 2021

0.2

Aug 31, 2021

0.1

Jul 29, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autowave-2.0.0.tar.gz (18.7 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autowave-2.0.0-py3-none-any.whl (21.3 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file autowave-2.0.0.tar.gz.

File metadata

Download URL: autowave-2.0.0.tar.gz
Upload date: Mar 31, 2026
Size: 18.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for autowave-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`45be862d245184420a0457cdb407aa168fcb7e565d862d6639ceb9ab7a232647`
MD5	`7db0363e4cd42d903ec89448b11684a7`
BLAKE2b-256	`5dc99196c46b49041f9ca749ec1810aa73f8d9fbbd36c8dba57a21325eff0536`

See more details on using hashes here.

File details

Details for the file autowave-2.0.0-py3-none-any.whl.

File metadata

Download URL: autowave-2.0.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 21.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for autowave-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c2cdb633766d74f8f514e7e1de83da2044cd61dbc248cc2a08948e5de74e034`
MD5	`7a6d932b5258ea6b255f9c6950968850`
BLAKE2b-256	`5409d799a734ddb228df0da576441ab17e3fabe4e4f6b7501eab2e9c283deeee`

See more details on using hashes here.

AutoWave 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Quick Start

1. Prepare your dataset

2. Train and predict

3. Evaluate

4. Save and reload

Zero-Shot Classification (no training)

Advanced Options

Available models

Export to ONNX

Visualization

Audio Utilities

Supported Audio Formats

Core Contributors

Citation

Developed with Love ❤️

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes