Skip to main content

The simplest way to classify audio in Python — pretrained transformers in 3 lines of code.

Project description

AutoWave - Automatic Audio Classification Library

The simplest way to classify audio in Python.

AutoWave v2 Python 3.10+ PyTorch 2.0+ HuggingFace Downloads


Powered by pretrained transformer models (AST, Wav2Vec2, HuBERT) via HuggingFace — fine-tune a state-of-the-art audio classifier on your own dataset in a few lines of code.

from autowave import AudioClassifier

# 1. Load and train
model = AudioClassifier()
model.fit("data/train/")

# 2. Predict
result = model.predict("test.wav")
print(result)  # {"label": "dog_bark", "confidence": 0.94}

# 3. Evaluate
metrics = model.evaluate("data/test/")
print(f"Accuracy: {metrics['accuracy']:.2%}")

# 4. Save & reload
model.save("my_model/")
loaded = AudioClassifier.load("my_model/")

Installation

pip install AutoWave

Requirements: Python ≥ 3.10, PyTorch ≥ 2.0


Quick Start

1. Prepare your dataset

Organize audio files into class subfolders:

data/
  train/
    dog/     bark1.wav  bark2.wav  ...
    cat/     meow1.wav  meow2.wav  ...
    bird/    chirp1.wav chirp2.wav ...
  test/
    dog/     ...
    cat/     ...

2. Train and predict

from autowave import AudioClassifier

model = AudioClassifier()
model.fit("data/train/")
model.predict("data/test/dog/bark_test.wav")
# → {"label": "dog", "confidence": 0.97}

3. Evaluate

results = model.evaluate("data/test/")
print(f"Accuracy: {results['accuracy']:.2%}")
print(results["report"])

4. Save and reload

model.save("my_model/")
loaded = AudioClassifier.load("my_model/")
loaded.predict("new_audio.wav")

Zero-Shot Classification (no training)

Classify audio against any text labels — no dataset or fine-tuning required:

from autowave import ZeroShotClassifier

clf = ZeroShotClassifier()
clf.predict("audio.wav", labels=["dog barking", "cat meowing", "rain", "music"])
# → [{"label": "dog barking", "confidence": 0.91}, ...]

Advanced Options

model = AudioClassifier(
    model_name="ast",          # "ast" | "wav2vec2" | "hubert" | "wavlm" | any HF model ID
    epochs=10,
    batch_size=8,
    learning_rate=1e-4,
    augment=True,              # noise, pitch shift, time stretch, shift
    device="auto",             # "auto" | "cuda" | "mps" | "cpu"
    output_dir="checkpoints/",
    max_duration_s=10.0,
)
model.fit("data/train/", val_folder="data/val/")

Available models

Short name HuggingFace model Best for
ast (default) MIT/ast-finetuned-audioset-10-10-0.4593 All audio types
wav2vec2 facebook/wav2vec2-base Speech tasks
hubert facebook/hubert-base-ls960 Speech tasks
wavlm microsoft/wavlm-base Speech benchmarks

Any HuggingFace AutoModelForAudioClassification-compatible model ID also works.


Export to ONNX

model.export_onnx("model.onnx")

Visualization

from autowave.visualization import plots

plots.waveform("audio.wav")
plots.spectrogram("audio.wav")
plots.mfcc("audio.wav")
plots.spectral_centroid("audio.wav")
plots.time_freq_overview("audio.wav")

Audio Utilities

from autowave.utils.audio import read_properties, resample, convert_format

# Metadata
props = read_properties("audio.wav")
print(props.sample_rate, props.duration_s, props.channels)

# Resample to 16 kHz
resample("audio.mp3", target_sr=16000, output_path="audio_16k.wav")

# Convert format
convert_format("audio.wav", output_format="mp3")

Supported Audio Formats

.wav · .mp3 · .flac · .ogg · .m4a · .aiff


Core Contributors

Nilesh Verma
Nilesh Verma
Satyajit Pattnaik
Satyajit Pattnaik
Kalash Jindal
Kalash Jindal

Citation

If you use AutoWave in your research or project, please cite:

@software{autowave2024,
  author       = {Verma, Nilesh and Pattnaik, Satyajit and Jindal, Kalash},
  title        = {{AutoWave}: Automatic Audio Classification with Pretrained Transformers},
  year         = {2024},
  version      = {2.0.0},
  url          = {https://github.com/TechyNilesh/Autowave},
  note         = {Python library for audio classification using AST, Wav2Vec2, HuBERT, and WavLM}
}

Developed with Love ❤️

Developed for ML researchers, data scientists, Python developers, speech engineers, and the open-source audio community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autowave-2.0.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autowave-2.0.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file autowave-2.0.0.tar.gz.

File metadata

  • Download URL: autowave-2.0.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for autowave-2.0.0.tar.gz
Algorithm Hash digest
SHA256 45be862d245184420a0457cdb407aa168fcb7e565d862d6639ceb9ab7a232647
MD5 7db0363e4cd42d903ec89448b11684a7
BLAKE2b-256 5dc99196c46b49041f9ca749ec1810aa73f8d9fbbd36c8dba57a21325eff0536

See more details on using hashes here.

File details

Details for the file autowave-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: autowave-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for autowave-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c2cdb633766d74f8f514e7e1de83da2044cd61dbc248cc2a08948e5de74e034
MD5 7a6d932b5258ea6b255f9c6950968850
BLAKE2b-256 5409d799a734ddb228df0da576441ab17e3fabe4e4f6b7501eab2e9c283deeee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page