Repository for the following paper. Frank Zalkow and Meinard Müller, Using Weakly Aligned Score–Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval, ISMIR 2020.
Using Weakly Aligned Score–Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval
This repository contains accompanying code for the following paper. If you use code from this repository, please consider citing the paper.
Frank Zalkow and Meinard Müller: Using Weakly Aligned Score–Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval. In Proceedings of the International Society for Music Information Retrieval Conference, Montréal, Canada, 2020.
There is an accompanying website for the paper.
You can install the code in this repository with pip:
pip install ctc_chroma
There are two ways to use the models of this repository. The first way is to use a Jupyter notebook. This notebook applies the model and visualizes its output. The second way is to use a script to batch process audio files in a folder. This script can be executed like this:
python apply_model.py -m MODEL_ID -i INPUT -o OUTPUT
INPUT is a directory with audio files,
OUTPUT is a directory for the output files, and
MODEL_ID specifies the model variant. There are ten model variants contained in the repository, due to different training and validation splits. The identifiers for the variants used in the paper are
train512valid3. Furthermore, we provide models where we used more training data (without having a left-out test set in our dataset.) The identifiers for these additional models are
train5123valid4. It may not be fair to test the latter models with the dataset used in the paper. However, we recommend using them for audio files outside our dataset.
For making it easy to directly try out the code of this repository, we included two excerpts from public domain recordings, which we downloaded from Musopen. The excerpts correspond to the musical sections that are used for the figures in the paper (Figure 3 and 4). However, different performances (not public domain) have been used to generate the figures in the paper. Below you find a small table with details for the excerpts.
|Beethoven_Op067-01_DavidHighSchool.wav||Beethoven||Symphony no. 5, op. 67||Davis High School Symphony Orchestra||First movement, first theme|
|Beethoven_Op002-2-01_Pitman.wav||Beethoven||Piano Sonata no. 2, op. 2 no. 2||Paul Pitman||First movement, second theme|
Frank Zalkow and Meinard Müller are supported by the German Research Foundation (DFG-MU 2686/11-1, MU 2686/12-1). We thank Daniel Stoller for fruitful discussions on the CTC loss, and Michael Krause for proof-reading the manuscript. We also thank Stefan Balke and Vlora Arifi-Müller as well as all students involved in the annotation work, especially Lena Krauß and Quirin Seilbeck. The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraunhofer Institute for Integrated Circuits IIS. The authors gratefully acknowledge the compute resources and support provided by the Erlangen Regional Computing Center (RRZE).
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size ctc_chroma-1.3-py3-none-any.whl (7.9 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size ctc_chroma-1.3.tar.gz (5.6 kB)||File type Source||Python version None||Upload date||Hashes View|