Skip to main content

Simple Audio Embeddings

Project description

auditus

auditus gives you simple access to state-of-the-art audio embeddings. Like SentenceTransformers for audio.

$ pip install auditus

Quickstart

The high-level object in auditus is the AudioPipeline which takes in a path and returns a pooled embedding.

from auditus.transform import AudioPipeline

pipe = AudioPipeline(
    # Default AST model
    model_name="MIT/ast-finetuned-audioset-10-10-0.4593", 
    # PyTorch output
    return_tensors="pt", 
    # Resampled to 16KhZ
    target_sr=16000, 
    # Mean pooling to obtain single embedding vector
    pooling="mean",
)

output = pipe("../test_files/XC119042.ogg").squeeze(0)
print(output.shape)
output[:5]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([768])

tensor([0.8653, 1.1659, 0.5956, 0.8498, 0.5322])

To see AudioPipeline in action on a practical use case, check out this Kaggle Notebook for the BirdCLEF+ 2025 competition.

Individual steps

auditus offers a range of transforms to process audio for downstream tasks.

Loading

Simply load audio with a given sampling rate.

from auditus.transform import AudioLoader

audio = AudioLoader(sr=32000)("../test_files/XC119042.ogg")
audio
auditus.core.AudioArray(a=array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, ...,
       -2.03555092e-01, -2.03390077e-01, -2.45199591e-01]), sr=32000)

The AudioArray object offers a convenient interface to inspect the audio data. Like listening to the audio in Jupyter Notebook with audio.audio().

audio.a[:5], audio.sr, len(audio)
(array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, -5.17481631e-08,
        -1.35020821e-06]),
 32000,
 632790)

Resampling

Many Audio Transformer models work only on a specific sampling rate. With Resampling you can resample the audio to the desired sampling rate. Here we go from 32kHz to 16kHz.

from auditus.transform import Resampling

resampled = Resampling(target_sr=16000)(audio)
resampled
auditus.core.AudioArray(a=array([-2.64216160e-05,  5.56613802e-06, -1.35020873e-06, ...,
       -2.39605007e-01, -2.03555112e-01, -2.45199591e-01]), sr=16000)

Embedding

The main transform in auditus is the AudioEmbedding transform. It takes an AudioArray and returns a tensor. Check out the HuggingFace docs for more information on the available parameters.

from auditus.transform import AudioEmbedding

emb = AudioEmbedding(return_tensors="pt")(resampled)
print(emb.shape)
emb[0][:5]
/Users/clepelaars/miniconda3/envs/py312/lib/python3.12/site-packages/transformers/audio_utils.py:297: UserWarning: At least one mel filter has all zero values. The value for `num_mel_filters` (128) may be set too high. Or, the value for `num_frequency_bins` (256) may be set too low.
  warnings.warn(

torch.Size([1214, 768])

tensor([-0.5876,  0.2830, -0.7292,  0.7644, -1.1770])

Pooling

After generating the embeddings, you often want to pool the embeddings to a single vector. Pooling supports mean and max pooling.

from auditus.transform import Pooling

pooled = Pooling(pooling="max")(emb)
print(pooled.shape)
pooled[:5]
torch.Size([768])

tensor([2.8619, 2.7183, 4.1288, 2.6302, 2.2177])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditus-0.0.6.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auditus-0.0.6-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file auditus-0.0.6.tar.gz.

File metadata

  • Download URL: auditus-0.0.6.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.6.tar.gz
Algorithm Hash digest
SHA256 a7c5883b8d760a55417124a9868b39924ca628591d182a79e707db6c78fa769b
MD5 f24e5fc6eb15773c7075d913cdfd752d
BLAKE2b-256 5f50109e15c64c90ff3e84958b2cc8e7e296536e04288a4064bf1b9befa545b1

See more details on using hashes here.

File details

Details for the file auditus-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: auditus-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9fa5835484e962d3440e2e28b6eb8a2480f1cbf630d2459d56704410cc217365
MD5 c3a06cbcc6bbf229658d0fe91b33d1d9
BLAKE2b-256 b3f0ed59d9c9e145f09bf5cbe35e1d685917154b35d78926488b48b0127be141

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page