Skip to main content

Encode and decode audio samples to/from compressed representations!

Project description

Music2Latent

Encode and decode audio samples to/from compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.

music2latent

Read the ISMIR 2024 paper here. Listen to audio samples here.

Under the hood, Music2Latent uses a Consistency Autoencoder model to efficiently encode and decode audio samples.

44.1 kHz audio is encoded into a sequence of ~10 Hz, and each of the latents has 64 channels. 48 kHz audio can also be encoded, which results in a sequence of ~12 Hz. A generative model can then be trained on these embeddings, or they can be used for other downstream tasks.

Music2Latent was trained on music and on speech. Refer to the paper for more details.

Installation

pip install music2latent

The model weights will be downloaded automatically the first time the code is run.

How to use

To encode and decode audio samples to/from latent embeddings:

audio_path = librosa.example('trumpet')
wv, sr = librosa.load(audio_path, sr=44100)

from music2latent import EncoderDecoder
encdec = EncoderDecoder()

latent = encdec.encode(wv)
# latent has shape (batch_size/audio_channels, dim (64), sequence_length)

wv_rec = encdec.decode(latent)

To extract encoder features to use in downstream tasks:

features = encoder.encode(wv, extract_features=True)

These features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction. It will not be possible to directly decode these features back to audio.

music2latent supports more advanced usage, including GPU memory management controls. Please refer to tutorial.ipynb.

License

This library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.

This work was conducted by Marco Pasini during his PhD at Queen Mary University of London, in partnership with Sony Computer Science Laboratories Paris. This work was supervised by Stefan Lattner and George Fazekas.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

music2latent-0.1.6.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

music2latent-0.1.6-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file music2latent-0.1.6.tar.gz.

File metadata

  • Download URL: music2latent-0.1.6.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for music2latent-0.1.6.tar.gz
Algorithm Hash digest
SHA256 f5edfe04538261ee1b61723328423f306c8e611d3ecdb9c528a8056fb7f96413
MD5 edb08d6da759999eabbd1da092b76cf7
BLAKE2b-256 d103b0736244bb05a2ed09ebaa0fd9127172b0f03f4a3751d4080def672bd011

See more details on using hashes here.

File details

Details for the file music2latent-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: music2latent-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for music2latent-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1dfc23299bbb55d68e3a9e6394af359a7155c027c1d836d2ba78ebccd3ae3cbe
MD5 84837002c11b4e7c110c0ff44c64aa4a
BLAKE2b-256 b6f28b420c4dff220aa81f2f3efcdebe7daf3c4f5ae055746dbbbe240035b7e1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page