Skip to main content

Simple Audio Embeddings

Project description

auditus

auditus gives you simple access to state-of-the-art audio embeddings. Like SentenceTransformers for audio.

$ pip install auditus

Quickstart

The high-level object in auditus is the AudioPipeline which takes in a path and returns a pooled embedding.

from auditus.transform import AudioPipeline

pipe = AudioPipeline(
    # Default AST model
    model_name="MIT/ast-finetuned-audioset-10-10-0.4593", 
    # PyTorch output
    return_tensors="pt", 
    # Resampled to 16KhZ
    target_sr=16000, 
     # Mel-frequency bins is equal to output length for this model.
    num_mel_bins=64,
    # 1024 length equals max. ~25.6 seconds with default hop length.
    # Longer files are truncated.
    max_length=1024,
    # Mean pooling to obtain single embedding vector
    pooling="mean",
)

output = pipe("../test_files/XC119042.ogg").squeeze(0)
print(output.shape)
output[:5]
torch.Size([64])

tensor([-0.0943, -0.1549, -0.2868, -0.3495, -0.4023])

To see AudioPipeline in action on a practical use case, check out this Kaggle Notebook for the BirdCLEF+ 2025 competition.

Individual steps

auditus offers a range of transforms to process audio for downstream tasks.

Loading

Simply load audio with a given sampling rate.

from auditus.transform import AudioLoader

audio = AudioLoader(sr=32000)("../test_files/XC119042.ogg")
audio
auditus.core.AudioArray(a=array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, ...,
       -2.03555092e-01, -2.03390077e-01, -2.45199591e-01]), sr=32000)

The AudioArray object offers a convenient interface to inspect the audio data. Like listening to the audio in Jupyter Notebook with audio.audio().

audio.a[:5], audio.sr, len(audio)
(array([-2.64216160e-05, -2.54259703e-05,  5.56615578e-06, -5.17481631e-08,
        -1.35020821e-06]),
 32000,
 632790)

Resampling

Many Audio Transformer models work only on a specific sampling rate. With Resampling you can resample the audio to the desired sampling rate. Here we go from 32kHz to 16kHz.

from auditus.transform import Resampling

resampled = Resampling(target_sr=16000)(audio)
resampled
auditus.core.AudioArray(a=array([-2.64216160e-05,  5.56613802e-06, -1.35020873e-06, ...,
       -2.39605007e-01, -2.03555112e-01, -2.45199591e-01]), sr=16000)

Embedding

The main transform in auditus is the AudioEmbedding transform. It takes an AudioArray and returns a tensor. Check out the HuggingFace docs for more information on the available parameters.

from auditus.transform import AudioEmbedding

emb = AudioEmbedding(return_tensors="pt", num_mel_bins=64, sampling_rate=16000)(resampled)
print(emb.shape)
emb[0][0][:5]
torch.Size([1, 1024, 64])

tensor([-0.8148, -0.9460, -0.9955, -0.9856, -1.0303])

Pooling

After generating the embeddings, you often want to pool the embeddings to a single vector. Pooling supports mean and max pooling.

from auditus.transform import Pooling

pooled = Pooling(pooling="max")(emb)
print(pooled.shape)
pooled[0][:5]
torch.Size([1, 64])

tensor([ 0.3470,  0.2991,  0.1366, -0.0023, -0.1394])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auditus-0.0.4.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auditus-0.0.4-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file auditus-0.0.4.tar.gz.

File metadata

  • Download URL: auditus-0.0.4.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.4.tar.gz
Algorithm Hash digest
SHA256 03aa0faffd43c678250ff8d1bf89253c6bdfc1796c9b667c60e80d72376b42b6
MD5 ee0bed2e17c20b4668ef7f81ba1a7db8
BLAKE2b-256 ee84495bc73a522666fed5a8e13d4c72e0badaee786133a6763076f42c7ab1a8

See more details on using hashes here.

File details

Details for the file auditus-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: auditus-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for auditus-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e56ab353c85a5ce8671f8d0cde7fc294c969d7e8c8630cad0ee2c9472ee0332c
MD5 6a37553b194fa1c3e798ce72ebe919e2
BLAKE2b-256 7849a375c55c6e3ae8b2136cbf7df0674d211784607b4359ae118146739fb8ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page