Skip to main content

MuQ: A deep learning model for music and text

Project description

MuQ & MuQ-MuLan

Static Badge Static Badge Static Badge Static Badge Static Badge

This is the official repository for the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

In this repo, the following models are released:

  • MuQ: A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
  • MuQ-MuLan: A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.

Overview

We develop the MuQ for music SSL. MuQ applys our proposed Mel-RVQ as quantitative targets and achieves SOTA performance on many music understanding (or MIR) tasks.

We also construct the MuQ-MuLan, a CLIP-like model trained by contrastive learning, which jointly represents music and text into embeddings.

For more details, please refer to our paper.

Evaluation on MARBLE Benchmark Evaluation on Zero-shot Music Tagging

Usage

To begin with, please use pip to install the official muq lib, and ensure that your python>=3.8:

pip3 install muq

To extract music audio features using MuQ, you can refer to the following code:

import torch, librosa
from muq import MuQ

device = 'cuda'
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
wavs = torch.tensor(wav).unsqueeze(0).to(device) 

# This will automatically fetch the checkpoint from huggingface
muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
muq = muq.to(device).eval()

with torch.no_grad():
    output = muq(wavs, output_hidden_states=True)

print('Total number of layers: ', len(output.hidden_states))
print('Feature shape: ', output.last_hidden_state.shape)

Using MuQ-MuLan to extract the music and text embeddings and calculate the similarity:

import torch, librosa
from muq import MuQMuLan

# This will automatically fetch checkpoints from huggingface
device = 'cuda'
mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
mulan = mulan.to(device).eval()

# Extract music embeddings
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
wavs = torch.tensor(wav).unsqueeze(0).to(device) 
with torch.no_grad():
    audio_embeds = mulan(wavs = wavs) 

# Extract text embeddings (texts can be in English or Chinese)
texts = ["classical genres, hopeful mood, piano."]
with torch.no_grad():
    text_embeds = mulan(texts = texts)

# Calculate dot product similarity
sim = mulan.calc_similarity(audio_embeds, text_embeds)
print(sim)

Performance

Table MARBLE Benchmark Table Mulan Results

Model Checkpoints

Model Name Parameters Data HuggingFace🤗
MuQ ~300M MSD dataset OpenMuQ/MuQ-large-msd-iter
MuQ-MuLan ~700M music-text pairs OpenMuQ/MuQ-MuLan-large

Note: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.

License

The code in this repository is released under the MIT license as found in the LICENSE file.

The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) in this repository are released under the CC-BY-NC 4.0 license, as detailed in the LICENSE_weights file.

Citation

Upcoming

Acknowledgement

We borrow many codes from the following repositories:

Also, we are especially grateful to the awesome MARBLE-Benchmark.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

muq-0.1.0.tar.gz (55.8 kB view details)

Uploaded Source

File details

Details for the file muq-0.1.0.tar.gz.

File metadata

  • Download URL: muq-0.1.0.tar.gz
  • Upload date:
  • Size: 55.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.0

File hashes

Hashes for muq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 67ebb47d238e702151253d08e83e2dbbbe89d24c414b5b410d9b3db55c4a6fa8
MD5 51be18c4df96737cb54db5d94f052fa8
BLAKE2b-256 093bb503410bd4eb81e18dd5f39e2a89b5e9edaa85c53c3448c7111c2a656438

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page