All-In-One Music Structure Analyzer

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

All-In-One Music Structure Analysis Model

NOTE: This is a work in progress

Table of Contents

Installation
Usage
Available Models
Speed
Advanced Usage for Research
Concerning MP3 Files
Citation

Installation

1. Install PyTorch

Visit PyTorch and install the appropriate version for your system.

2. Install NATTEN (For Linux and Windows only, not required for macOS)

Linux

Visit NATTEN website and download the appropriate version for your system.

macOS

No need to install NATTEN, it will be installed automatically when installing allin1.

Windows

Build NATTEN from source:

pip install ninja # Recommended, not required
git clone https://github.com/SHI-Labs/NATTEN
cd NATTEN
make

3. Install the package

pip install git+https://github.com/CPJKU/madmom  # install the latest madmom directly from GitHub
pip install allin1  # install this package

4. (Optional) Install FFmpeg for MP3 support

For ubuntu:

sudo apt install ffmpeg

For macOS:

brew install ffmpeg

Usage

CLI

allin1 your_audio_file1.wav your_audio_file2.mp3

The result will be saved in `./structures:

./structures
└── your_audio_file1.json
└── your_audio_file2.json

And a JSON analysis result has:

{
  "beats": [ 0.33, 0.75, 1.14, ... ],
  "downbeats": [ 0.33, 1.94, 3.53, ... ],
  "beat_positions": [ 1, 2, 3, 4, 1, 2, 3, 4, 1, ... ],
  "segments": [
    {
      "start": 0.0,
      "end": 0.33,
      "label": "start"
    },
    {
      "start": 0.33,
      "end": 13.13,
      "label": "intro"
    },
    {
      "start": 13.13,
      "end": 37.53,
      "label": "chorus"
    },
    {
      "start": 37.53,
      "end": 51.53,
      "label": "verse"
    },
    ...
  ]
}

Python

import allinone

# You can analyze a single file:
result = allinone.analyze('your_audio_file.wav')

# Or multiple files:
results = allinone.analyze(['your_audio_file1.wav', 'your_audio_file2.mp3'])

A result is a dataclass instance containing:

AnalysisResult(
  beats=[0.33, 0.75, 1.14, ...],
  beat_positions=[1, 2, 3, 4, 1, 2, 3, 4, 1, ...],
  downbeats=[0.33, 1.94, 3.53, ...], 
  segments=[
    Segment(start=0.0, end=0.33, label='start'), 
    Segment(start=0.33, end=13.13, label='intro'), 
    Segment(start=13.13, end=37.53, label='chorus'), 
    Segment(start=37.53, end=51.53, label='verse'), 
    Segment(start=51.53, end=64.34, label='verse'), 
    Segment(start=64.34, end=89.93, label='chorus'), 
    Segment(start=89.93, end=105.93, label='bridge'), 
    Segment(start=105.93, end=134.74, label='chorus'), 
    Segment(start=134.74, end=153.95, label='chorus'), 
    Segment(start=153.95, end=154.67, label='end'),
  ]),

Available Models

harmonix-all: An ensemble model averaging the predictions of 8 models trained on each fold.
harmonix-foldN: A model trained on fold N (0~7). For example, harmonix-fold0 is trained on fold 0.

By default, the harmonix-all model is used. To use a different model, use the --model option:

allin1 --model harmonix-fold0 your_audio_file.wav

Speed

Using the harmonix-all ensemble model, which includes 8 models trained on each fold, 10 songs (totalling 33 minutes) were processed in 73 seconds. The hardware utilized was an RTX 4090 GPU and an Intel i9-10940X CPU (14 cores, 28 threads, 3.30 GHz).

Advanced Usage for Research

TODO

If you run the analysis with --embed option and harmonix-all ensemble model, the embeddings will be stacked and saved (shape: [stems=4, time_steps, embedding_size=24, models=8]) --embed is not available for the ensemble model harmonix-all

Concerning MP3 Files

Depending on decoders, MP3 files may have a slight different offsets. I recommend first converting your audio files to WAV format using FFmpeg (as shown below) and run the analysis on the WAV files.

ffmpeg -i your_audio_file.mp3 your_audio_file.wav

In more technical details, this package relies on demucs for reading audio files. As far as I know, demucs first converts MP3 files to WAV format using FFmpeg and then reads the WAV files. However, if you use a different MP3 decoder, the offsets may be different. I observed it can vary about 20~40ms, which is not acceptable for tasks requiring precise timing such as beat tracking. Conventionally when evaluating beat tracking performances, the tolerance window size is 70ms. Therefore, I recommend you to unify input formats to WAV for all your data processing pipelines, which doesn't require complex decoding.

Citation

@inproceedings{taejun2023allinone,
  title={All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio},
  author={Kim, Taejun and Nam, Juhan},
  booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  year={2023}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.0

Oct 10, 2023

1.0.3

Oct 5, 2023

1.0.2

Oct 5, 2023

1.0.1

Oct 4, 2023

1.0.0

Sep 4, 2023

0.4.6

Sep 3, 2023

0.4.5

Sep 3, 2023

0.4.4

Sep 3, 2023

0.4.3

Sep 3, 2023

0.4.2

Sep 3, 2023

0.4.1

Sep 3, 2023

0.4.0

Sep 2, 2023

0.3.0

Aug 30, 2023

0.2.1

Aug 29, 2023

0.2

Aug 29, 2023

0.1.0

Aug 14, 2023

0.0.13

Aug 14, 2023

0.0.12

Aug 10, 2023

0.0.11

Aug 10, 2023

This version

0.0.10

Aug 10, 2023

0.0.9

Aug 9, 2023

0.0.8

Aug 9, 2023

0.0.7

Aug 9, 2023

0.0.6

Aug 9, 2023

0.0.5

Aug 9, 2023

0.0.4

Aug 9, 2023

0.0.3

Aug 9, 2023

0.0.2

Aug 9, 2023

0.0.1

Aug 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allin1-0.0.10.tar.gz (19.9 kB view hashes)

Uploaded Aug 10, 2023 Source

Built Distribution

allin1-0.0.10-py3-none-any.whl (23.8 kB view hashes)

Uploaded Aug 10, 2023 Python 3

Hashes for allin1-0.0.10.tar.gz

Hashes for allin1-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`800392e854677c9d4dade14bc36b65274f874e62da83b83efed16a24222f5561`
MD5	`f14ee4ff7523059a41ce5e48f58c259d`
BLAKE2b-256	`4d697b5525f10859b67aaa27c23fc25e04e9923b68c8f68921afd21df29cda1f`

Hashes for allin1-0.0.10-py3-none-any.whl

Hashes for allin1-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a7b3b460abc12548ef988d52ac2cb2793e0893e9e11022efe58931674c76c06`
MD5	`00879a0388fd256b3af8ca8aaf7f1fe6`
BLAKE2b-256	`c390ffe50785f7ef6e0d25f03d73e5d6ec9d18274295b72c0a66347f6b50634e`