Encode and decode audio samples to compressed representations! Useful for generative modelling applications.
Project description
Music2Latent
Encode and decode audio samples to compressed representations! Useful for efficient generative modelling applications and for other downstream tasks.
Read the ISMIR 2024 paper here.
Under the hood, Music2Latent uses a Consistency Autoencoder model to efficiently encode and decode audio samples. 44.1 kHz audio is encoded into a sequence of ~10 Hz, and each of the latents has 64 channels. You can then train a generative model on these embeddings, or use them for other downstream tasks.
Music2Latent was trained on music and on speech. Refer to the paper for more details.
Installation
pip install music2latent
The model weights will be downloaded automatically the first time you run the code.
How to use
To encode and decode audio samples to/from latent embeddings:
audio_path = librosa.example('trumpet')
wv, sr = librosa.load(audio_path, sr=44100)
from music2latent2 import EncoderDecoder
encdec = EncoderDecoder()
latent = encdec.encode(wv)
wv_rec = encdec.decode(latent)
If you need to extract encoder features to use in downstream tasks, and you don't need to reconstruct the audio:
features = encoder.encode(wv, extract_features=True)
These features are extracted before the encoder bottleneck, and thus have more channels (contain more information) than the latents used for reconstruction.
music2latent2 supports more advanced usage, inclusing GPU memory management controls. Please refer to tutorial.ipynb.
License
This library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.
This work was conducted by Marco Pasini during his PhD at Queen Mary University of London, in partnership with Sony Computer Science Laboratories Paris. This work was supervised by Stefan Lattner and George Fazekas.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for music2latent-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 066f17427c8d54b9befb44b4905a42e63be7f4121291bdb41ab83b4de493e352 |
|
MD5 | a48f242bd46daae3cd9e818c04815357 |
|
BLAKE2b-256 | c8ccf0a0678f761fa71597b6dbe90db9f13881d75a27d647abc84555ce9b526f |