Skip to main content

Simple Audio Embeddings

Project description

auditus

auditus gives you simple access to state-of-the-art audio embeddings. Like SentenceTransformers for audio.

$ pip install auditus

Quickstart

The high-level object in auditus is the AudioPipeline which takes in a path and returns a pooled embedding.

from auditus.transform import AudioPipeline

pipe = AudioPipeline(
    # Default AST model
    model_name="MIT/ast-finetuned-audioset-10-10-0.4593", 
    # PyTorch output
    return_tensors="pt", 
    # Resampled to 16KhZ
    target_sr=16000, 
    # Mean pooling to obtain single embedding vector
    pooling="mean",
)

output = pipe("../test_files/XC119042.ogg").squeeze(0)
print(output.shape)
output[:5]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([1214])

tensor([0.0139, 0.0156, 0.0410, 0.0316, 0.0380])

To see AudioPipeline in action on a practical use case, check out this Kaggle Notebook for the BirdCLEF+ 2025 competition.

Individual steps

auditus offers a range of transforms to process audio for downstream tasks.

Loading

Simply load audio with a given sampling rate.

from auditus.transform import AudioLoader

audio = AudioLoader(sr=32000)("../test_files/XC119042.ogg")
audio
auditus.core.AudioArray(a=array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, ...,
       -2.03555092e-01, -2.03390077e-01, -2.45199591e-01]), sr=32000)

The AudioArray object offers a convenient interface to inspect the audio data. Like listening to the audio in Jupyter Notebook with audio.audio().

audio.a[:5], audio.sr, len(audio)
(array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, -5.17481631e-08,
        -1.35020821e-06]),
 32000,
 632790)

Resampling

Many Audio Transformer models work only on a specific sampling rate. With Resampling you can resample the audio to the desired sampling rate. Here we go from 32kHz to 16kHz.

from auditus.transform import Resampling

resampled = Resampling(target_sr=16000)(audio)
resampled
auditus.core.AudioArray(a=array([-2.64216160e-05,  5.56613802e-06, -1.35020873e-06, ...,
       -2.39605007e-01, -2.03555112e-01, -2.45199591e-01]), sr=16000)

Embedding

The main transform in auditus is the AudioEmbedding transform. It takes an AudioArray and returns a tensor. Check out the HuggingFace docs for more information on the available parameters.

from auditus.transform import AudioEmbedding

emb = AudioEmbedding(return_tensors="pt")(resampled)
print(emb.shape)
emb[0][:5]
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([1214, 768])

tensor([-0.5876,  0.2830, -0.7292,  0.7644, -1.1770])

Pooling

After generating the embeddings, you often want to pool the embeddings to a single vector. Pooling supports mean and max pooling.

from auditus.transform import Pooling

pooled = Pooling(pooling="max")(emb)
print(pooled.shape)
pooled[:5]
torch.Size([1214])

tensor([ 4.3941,  4.0540, 12.2640, 11.9167, 13.1519])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditus-0.0.5.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auditus-0.0.5-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file auditus-0.0.5.tar.gz.

File metadata

  • Download URL: auditus-0.0.5.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.5.tar.gz
Algorithm Hash digest
SHA256 db30afdf41ad989288458828e55d566d637cd2b8aeb87e8841dd49483fe6505f
MD5 721ce96bf24b68a93ff54d4e3bf2b4e5
BLAKE2b-256 5d38e231ff9a106e3aeec25ec97cfa8097c7cbb39760e17b5a53bc2aeba94bdb

See more details on using hashes here.

File details

Details for the file auditus-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: auditus-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 86c7dcf2214ad9aeb06eb902c40eb571fe0def79f93ed39578eeca9d8400a439
MD5 0034221fe2491f44012f19f974a3a7dd
BLAKE2b-256 3341e92e7bfb7e718e4c9f817cbae13b4b29fde562d63c2500954d2e9d8a3679

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page