The simplest way to classify audio in Python — pretrained transformers in 3 lines of code.
Project description
The simplest way to classify audio in Python.
Powered by pretrained transformer models (AST, Wav2Vec2, HuBERT) via HuggingFace — fine-tune a state-of-the-art audio classifier on your own dataset in a few lines of code.
from autowave import AudioClassifier
# 1. Load and train
model = AudioClassifier()
model.fit("data/train/")
# 2. Predict
result = model.predict("test.wav")
print(result) # {"label": "dog_bark", "confidence": 0.94}
# 3. Evaluate
metrics = model.evaluate("data/test/")
print(f"Accuracy: {metrics['accuracy']:.2%}")
# 4. Save & reload
model.save("my_model/")
loaded = AudioClassifier.load("my_model/")
Installation
pip install AutoWave
Requirements: Python ≥ 3.10, PyTorch ≥ 2.0
Quick Start
1. Prepare your dataset
Organize audio files into class subfolders:
data/
train/
dog/ bark1.wav bark2.wav ...
cat/ meow1.wav meow2.wav ...
bird/ chirp1.wav chirp2.wav ...
test/
dog/ ...
cat/ ...
2. Train and predict
from autowave import AudioClassifier
model = AudioClassifier()
model.fit("data/train/")
model.predict("data/test/dog/bark_test.wav")
# → {"label": "dog", "confidence": 0.97}
3. Evaluate
results = model.evaluate("data/test/")
print(f"Accuracy: {results['accuracy']:.2%}")
print(results["report"])
4. Save and reload
model.save("my_model/")
loaded = AudioClassifier.load("my_model/")
loaded.predict("new_audio.wav")
Zero-Shot Classification (no training)
Classify audio against any text labels — no dataset or fine-tuning required:
from autowave import ZeroShotClassifier
clf = ZeroShotClassifier()
clf.predict("audio.wav", labels=["dog barking", "cat meowing", "rain", "music"])
# → [{"label": "dog barking", "confidence": 0.91}, ...]
Advanced Options
model = AudioClassifier(
model_name="ast", # "ast" | "wav2vec2" | "hubert" | "wavlm" | any HF model ID
epochs=10,
batch_size=8,
learning_rate=1e-4,
augment=True, # noise, pitch shift, time stretch, shift
device="auto", # "auto" | "cuda" | "mps" | "cpu"
output_dir="checkpoints/",
max_duration_s=10.0,
)
model.fit("data/train/", val_folder="data/val/")
Available models
| Short name | HuggingFace model | Best for |
|---|---|---|
ast (default) |
MIT/ast-finetuned-audioset-10-10-0.4593 | All audio types |
wav2vec2 |
facebook/wav2vec2-base | Speech tasks |
hubert |
facebook/hubert-base-ls960 | Speech tasks |
wavlm |
microsoft/wavlm-base | Speech benchmarks |
Any HuggingFace AutoModelForAudioClassification-compatible model ID also works.
Export to ONNX
model.export_onnx("model.onnx")
Visualization
from autowave.visualization import plots
plots.waveform("audio.wav")
plots.spectrogram("audio.wav")
plots.mfcc("audio.wav")
plots.spectral_centroid("audio.wav")
plots.time_freq_overview("audio.wav")
Audio Utilities
from autowave.utils.audio import read_properties, resample, convert_format
# Metadata
props = read_properties("audio.wav")
print(props.sample_rate, props.duration_s, props.channels)
# Resample to 16 kHz
resample("audio.mp3", target_sr=16000, output_path="audio_16k.wav")
# Convert format
convert_format("audio.wav", output_format="mp3")
Supported Audio Formats
.wav · .mp3 · .flac · .ogg · .m4a · .aiff
Core Contributors
|
Nilesh Verma |
Satyajit Pattnaik |
Kalash Jindal |
Citation
If you use AutoWave in your research or project, please cite:
@software{autowave2024,
author = {Verma, Nilesh and Pattnaik, Satyajit and Jindal, Kalash},
title = {{AutoWave}: Automatic Audio Classification with Pretrained Transformers},
year = {2024},
version = {2.0.0},
url = {https://github.com/TechyNilesh/Autowave},
note = {Python library for audio classification using AST, Wav2Vec2, HuBERT, and WavLM}
}
Developed with Love ❤️
Developed for ML researchers, data scientists, Python developers, speech engineers, and the open-source audio community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autowave-2.0.0.tar.gz.
File metadata
- Download URL: autowave-2.0.0.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45be862d245184420a0457cdb407aa168fcb7e565d862d6639ceb9ab7a232647
|
|
| MD5 |
7db0363e4cd42d903ec89448b11684a7
|
|
| BLAKE2b-256 |
5dc99196c46b49041f9ca749ec1810aa73f8d9fbbd36c8dba57a21325eff0536
|
File details
Details for the file autowave-2.0.0-py3-none-any.whl.
File metadata
- Download URL: autowave-2.0.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c2cdb633766d74f8f514e7e1de83da2044cd61dbc248cc2a08948e5de74e034
|
|
| MD5 |
7a6d932b5258ea6b255f9c6950968850
|
|
| BLAKE2b-256 |
5409d799a734ddb228df0da576441ab17e3fabe4e4f6b7501eab2e9c283deeee
|