Simple Audio Embeddings
Project description
auditus
auditus gives you simple access to state-of-the-art audio embeddings.
Like SentenceTransformers for audio.
$ pip install auditus
Quickstart
The high-level object in auditus is the
AudioPipeline
which takes in a path and returns a pooled embedding.
from auditus.transform import AudioPipeline
pipe = AudioPipeline(
# Default AST model
model_name="MIT/ast-finetuned-audioset-10-10-0.4593",
# PyTorch output
return_tensors="pt",
# Resampled to 16KhZ
target_sr=16000,
# Mean pooling to obtain single embedding vector
pooling="mean",
)
output = pipe("../test_files/XC119042.ogg").squeeze(0)
print(output.shape)
output[:5]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
warnings.warn(
torch.Size([1214])
tensor([0.0139, 0.0156, 0.0410, 0.0316, 0.0380])
To see
AudioPipeline
in action on a practical use case, check out this Kaggle Notebook for
the BirdCLEF+ 2025
competition.
Individual steps
auditus offers a range of transforms to process audio for downstream
tasks.
Loading
Simply load audio with a given sampling rate.
from auditus.transform import AudioLoader
audio = AudioLoader(sr=32000)("../test_files/XC119042.ogg")
audio
auditus.core.AudioArray(a=array([-2.64216160e-05, -2.54259703e-05, 5.56615578e-06, ...,
-2.03555092e-01, -2.03390077e-01, -2.45199591e-01]), sr=32000)
The
AudioArray
object offers a convenient interface to inspect the audio data. Like
listening to the audio in Jupyter Notebook with audio.audio().
audio.a[:5], audio.sr, len(audio)
(array([-2.64216160e-05, -2.54259703e-05, 5.56615578e-06, -5.17481631e-08,
-1.35020821e-06]),
32000,
632790)
Resampling
Many Audio Transformer models work only on a specific sampling rate.
With
Resampling
you can resample the audio to the desired sampling rate. Here we go from
32kHz to 16kHz.
from auditus.transform import Resampling
resampled = Resampling(target_sr=16000)(audio)
resampled
auditus.core.AudioArray(a=array([-2.64216160e-05, 5.56613802e-06, -1.35020873e-06, ...,
-2.39605007e-01, -2.03555112e-01, -2.45199591e-01]), sr=16000)
Embedding
The main transform in auditus is the
AudioEmbedding
transform. It takes an
AudioArray
and returns a tensor. Check out the HuggingFace
docs
for more information on the available parameters.
from auditus.transform import AudioEmbedding
emb = AudioEmbedding(return_tensors="pt")(resampled)
print(emb.shape)
emb[0][:5]
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
warnings.warn(
torch.Size([1214, 768])
tensor([-0.5876, 0.2830, -0.7292, 0.7644, -1.1770])
Pooling
After generating the embeddings, you often want to pool the embeddings
to a single vector.
Pooling
supports mean and max pooling.
from auditus.transform import Pooling
pooled = Pooling(pooling="max")(emb)
print(pooled.shape)
pooled[:5]
torch.Size([1214])
tensor([ 4.3941, 4.0540, 12.2640, 11.9167, 13.1519])
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auditus-0.0.5.tar.gz.
File metadata
- Download URL: auditus-0.0.5.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db30afdf41ad989288458828e55d566d637cd2b8aeb87e8841dd49483fe6505f
|
|
| MD5 |
721ce96bf24b68a93ff54d4e3bf2b4e5
|
|
| BLAKE2b-256 |
5d38e231ff9a106e3aeec25ec97cfa8097c7cbb39760e17b5a53bc2aeba94bdb
|
File details
Details for the file auditus-0.0.5-py3-none-any.whl.
File metadata
- Download URL: auditus-0.0.5-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86c7dcf2214ad9aeb06eb902c40eb571fe0def79f93ed39578eeca9d8400a439
|
|
| MD5 |
0034221fe2491f44012f19f974a3a7dd
|
|
| BLAKE2b-256 |
3341e92e7bfb7e718e4c9f817cbae13b4b29fde562d63c2500954d2e9d8a3679
|