Simple Audio Embeddings

These details have not been verified by PyPI

Project links

Homepage

Project description

auditus

auditus gives you simple access to state-of-the-art audio embeddings. Like SentenceTransformers for audio.

$ pip install auditus

Quickstart

The high-level object in auditus is the AudioPipeline which takes in a path and returns a pooled embedding.

from auditus.transform import AudioPipeline

pipe = AudioPipeline(
    # Default AST model
    model_name="MIT/ast-finetuned-audioset-10-10-0.4593", 
    # PyTorch output
    return_tensors="pt", 
    # Resampled to 16KhZ
    target_sr=16000, 
    # Mean pooling to obtain single embedding vector
    pooling="mean",
)

output = pipe("../test_files/XC119042.ogg").squeeze(0)
print(output.shape)
output[:5]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([1214])

tensor([0.0139, 0.0156, 0.0410, 0.0316, 0.0380])

To see AudioPipeline in action on a practical use case, check out this Kaggle Notebook for the BirdCLEF+ 2025 competition.

Individual steps

auditus offers a range of transforms to process audio for downstream tasks.

Loading

Simply load audio with a given sampling rate.

from auditus.transform import AudioLoader

audio = AudioLoader(sr=32000)("../test_files/XC119042.ogg")
audio

auditus.core.AudioArray(a=array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, ...,
       -2.03555092e-01, -2.03390077e-01, -2.45199591e-01]), sr=32000)

The AudioArray object offers a convenient interface to inspect the audio data. Like listening to the audio in Jupyter Notebook with audio.audio().

audio.a[:5], audio.sr, len(audio)

(array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, -5.17481631e-08,
        -1.35020821e-06]),
 32000,
 632790)

Resampling

Many Audio Transformer models work only on a specific sampling rate. With Resampling you can resample the audio to the desired sampling rate. Here we go from 32kHz to 16kHz.

from auditus.transform import Resampling

resampled = Resampling(target_sr=16000)(audio)
resampled

auditus.core.AudioArray(a=array([-2.64216160e-05,  5.56613802e-06, -1.35020873e-06, ...,
       -2.39605007e-01, -2.03555112e-01, -2.45199591e-01]), sr=16000)

Embedding

The main transform in auditus is the AudioEmbedding transform. It takes an AudioArray and returns a tensor. Check out the HuggingFace docs for more information on the available parameters.

from auditus.transform import AudioEmbedding

emb = AudioEmbedding(return_tensors="pt")(resampled)
print(emb.shape)
emb[0][:5]

/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([1214, 768])

tensor([-0.5876,  0.2830, -0.7292,  0.7644, -1.1770])

Pooling

After generating the embeddings, you often want to pool the embeddings to a single vector. Pooling supports mean and max pooling.

from auditus.transform import Pooling

pooled = Pooling(pooling="max")(emb)
print(pooled.shape)
pooled[:5]

torch.Size([1214])

tensor([ 4.3941,  4.0540, 12.2640, 11.9167, 13.1519])

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.6

Mar 26, 2025

This version

0.0.5

Mar 26, 2025

0.0.4

Mar 26, 2025

0.0.2

Mar 25, 2025

0.0.1

Mar 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditus-0.0.5.tar.gz (12.0 kB view details)

Uploaded Mar 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

auditus-0.0.5-py3-none-any.whl (10.3 kB view details)

Uploaded Mar 26, 2025 Python 3

File details

Details for the file auditus-0.0.5.tar.gz.

File metadata

Download URL: auditus-0.0.5.tar.gz
Upload date: Mar 26, 2025
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`db30afdf41ad989288458828e55d566d637cd2b8aeb87e8841dd49483fe6505f`
MD5	`721ce96bf24b68a93ff54d4e3bf2b4e5`
BLAKE2b-256	`5d38e231ff9a106e3aeec25ec97cfa8097c7cbb39760e17b5a53bc2aeba94bdb`

See more details on using hashes here.

File details

Details for the file auditus-0.0.5-py3-none-any.whl.

File metadata

Download URL: auditus-0.0.5-py3-none-any.whl
Upload date: Mar 26, 2025
Size: 10.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86c7dcf2214ad9aeb06eb902c40eb571fe0def79f93ed39578eeca9d8400a439`
MD5	`0034221fe2491f44012f19f974a3a7dd`
BLAKE2b-256	`3341e92e7bfb7e718e4c9f817cbae13b4b29fde562d63c2500954d2e9d8a3679`

See more details on using hashes here.

auditus 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

auditus

Quickstart

Individual steps

Loading

Resampling

Embedding

Pooling

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes