S-SONDO: Lightweight audio embeddings from self-supervised knowledge distillation
Project description
S-SONDO
Lightweight audio embeddings from self-supervised knowledge distillation.
Up to 61x smaller than teacher models, retaining up to 96% performance.
ICASSP 2026
Install
pip install ssondo
Quick Start
from ssondo import get_ssondo
model = get_ssondo()
embeddings = model(audio) # (batch, n_segments, 960)
No preprocessing, no config files, no manual downloads. Pass raw mono audio at 32 kHz and get embeddings.
Pretrained Classifiers
7 ready-to-use classifiers trained on standard audio benchmarks:
model = get_ssondo(head="esc50")
logits = model(audio) # (batch, 50)
| Head | Task | Classes |
|---|---|---|
esc50 |
Environmental sound | 50 |
us8k |
Urban sound | 10 |
fsd50k |
Sound events | 200 |
gtzan |
Music genre | 10 |
openmic |
Instrument recognition | 20 |
nsynth |
Instrument family | 11 |
magna-tag-a-tune |
Music auto-tagging | 50 |
Custom Heads
# Linear
model = get_ssondo(head="linear", n_classes=10)
# MLP
model = get_ssondo(head="mlp", n_classes=10, hidden_sizes=[512, 256])
Finetuning
# Linear probing (frozen backbone)
model = get_ssondo(head="linear", n_classes=10)
model.freeze_backbone()
model.train()
logits = model(audio)
loss = criterion(logits, labels)
loss.backward() # only head parameters update
# Full finetuning
model.unfreeze_backbone()
API at a Glance
from ssondo import get_ssondo, list_models, list_heads
model = get_ssondo() # load model
model = get_ssondo(head="esc50") # pretrained classifier
model = get_ssondo(head="linear", n_classes=10) # custom head
model = get_ssondo(device="cuda") # GPU
model = get_ssondo("path/to/checkpoint.ckpt") # local checkpoint
embeddings = model(audio) # (batch, n_segments, 960)
emb = model.get_embeddings(audio) # (batch, 960) mean-pooled
model.embedding_dim # 960
model.backbone # raw nn.Module
list_heads() # available classifiers
Model
S-SONDO ships with matpac-mobilenetv3 — a MobileNetV3 (2.9M params) distilled from MATPAC++, achieving the best downstream performance across all 7 benchmarks (96.4% of teacher performance at 61x fewer parameters). Embeddings are 960-dimensional.
Input
- Mono audio, single channel
- Sample rate: 32,000 Hz
- Internally sliced into 10 s segments and converted to 128-band log-mel spectrograms
Links
- Paper: arXiv
- Models: Hugging Face Hub
- Code & Training: GitHub
Citation
@inproceedings{eladlouni2026ssondo,
title={S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models},
author={El Adlouni, Mohammed Ali and Quelennec, Aurian and Chouteau, Pierre and Peeters, Geoffroy and Essid, Slim},
booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2026}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ssondo-0.3.1.tar.gz.
File metadata
- Download URL: ssondo-0.3.1.tar.gz
- Upload date:
- Size: 33.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8de21007f7190c20605a41b02c563064934b724881aff829abd5ece88fc93257
|
|
| MD5 |
606920027ca3533550dd47e979ceca7d
|
|
| BLAKE2b-256 |
335f67cd3deb615942380c781fed38191778ffef7c4b1109172bf1486d11297b
|
File details
Details for the file ssondo-0.3.1-py3-none-any.whl.
File metadata
- Download URL: ssondo-0.3.1-py3-none-any.whl
- Upload date:
- Size: 36.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c85351887c0fad1804d116048672102be7eccc18c23d21f6315f426083624098
|
|
| MD5 |
84afc44529de35fabbfcc9fd3c5388d5
|
|
| BLAKE2b-256 |
27ada42f332930ddef19c61ac3422026ee8642ad07785c935463a7e80a65be12
|