All-In-One Music Structure Analyzer

These details have not been verified by PyPI

Project links

Project description

All-In-One Music Structure Analyzer

This package provides models for music structure analysis, predicting:

Tempo (BPM)
Beats
Downbeats
Section boundaries
Section labels (e.g., intro, verse, chorus, bridge, outro)

Table of Contents

Installation
Usage
Available Models
Speed
Advanced Usage for Research
Concerning MP3 Files
Citation

Installation

1. Install PyTorch

Visit PyTorch and install the appropriate version for your system.

2. Install NATTEN (Required for Linux and Windows; macOS will auto-install)

Linux: Download from NATTEN website
macOS: Auto-installs with allin1.
Windows: Build from source:

pip install ninja # Recommended, not required
git clone https://github.com/SHI-Labs/NATTEN
cd NATTEN
make

3. Install the package

pip install git+https://github.com/CPJKU/madmom  # install the latest madmom directly from GitHub
pip install allin1  # install this package

4. (Optional) Install FFmpeg for MP3 support

For ubuntu:

sudo apt install ffmpeg

For macOS:

brew install ffmpeg

Usage

CLI

Run:

allin1 your_audio_file1.wav your_audio_file2.mp3

Results are saved in `./structures:

./structures
└── your_audio_file1.json
└── your_audio_file2.json

And a JSON analysis result has:

{
  "path": "/path/to/your_audio_file.wav",
  "bpm": 100,
  "beats": [ 0.33, 0.75, 1.14, ... ],
  "downbeats": [ 0.33, 1.94, 3.53, ... ],
  "beat_positions": [ 1, 2, 3, 4, 1, 2, 3, 4, 1, ... ],
  "segments": [
    {
      "start": 0.0,
      "end": 0.33,
      "label": "start"
    },
    {
      "start": 0.33,
      "end": 13.13,
      "label": "intro"
    },
    {
      "start": 13.13,
      "end": 37.53,
      "label": "chorus"
    },
    {
      "start": 37.53,
      "end": 51.53,
      "label": "verse"
    },
    ...
  ]
}

Available options:

$ allin1 --help

usage: allin1 [-h] [-a] [-e] [-o OUT_DIR] [-m MODEL] [-d DEVICE] [-k] [--demix-dir DEMIX_DIR] [--spec-dir SPEC_DIR] paths [paths ...]

positional arguments:
  paths                 Path to tracks

options:
  -h, --help            show this help message and exit
  -a, --activ           Save frame-level raw activations from sigmoid and softmax (default: False)
  -e, --embed           Save frame-level embeddings (default: False)
  -o OUT_DIR, --out-dir OUT_DIR
                        Path to a directory to store analysis results (default: ./structures)
  -m MODEL, --model MODEL
                        Name of the pretrained model to use (default: harmonix-all)
  -d DEVICE, --device DEVICE
                        Device to use (default: cuda if available else cpu)
  -k, --keep-byproducts
                        Keep demixed audio files and spectrograms (default: False)
  --demix-dir DEMIX_DIR
                        Path to a directory to store demixed tracks (default: ./demixed)
  --spec-dir SPEC_DIR   Path to a directory to store spectrograms (default: ./spectrograms)

Python

import allin1

# You can analyze a single file:
result = allin1.analyze('your_audio_file.wav')

# Or multiple files:
results = allin1.analyze(['your_audio_file1.wav', 'your_audio_file2.mp3'])

A result is a dataclass instance containing:

AnalysisResult(
  path='/path/to/your_audio_file.wav', 
  bpm=100,
  beats=[0.33, 0.75, 1.14, ...],
  beat_positions=[1, 2, 3, 4, 1, 2, 3, 4, 1, ...],
  downbeats=[0.33, 1.94, 3.53, ...], 
  segments=[
    Segment(start=0.0, end=0.33, label='start'), 
    Segment(start=0.33, end=13.13, label='intro'), 
    Segment(start=13.13, end=37.53, label='chorus'), 
    Segment(start=37.53, end=51.53, label='verse'), 
    Segment(start=51.53, end=64.34, label='verse'), 
    Segment(start=64.34, end=89.93, label='chorus'), 
    Segment(start=89.93, end=105.93, label='bridge'), 
    Segment(start=105.93, end=134.74, label='chorus'), 
    Segment(start=134.74, end=153.95, label='chorus'), 
    Segment(start=153.95, end=154.67, label='end'),
  ]),

Unlike CLI, it does not save the results to disk by default. You can save them as follows:

result = allin1.analyze(
  'your_audio_file.wav',
  out_dir='./structures',  # None by default
)

The Python API allin1.analyze() offers the same options as the CLI:

def analyze(
  paths: PathLike | List[PathLike],
  out_dir: PathLike = None,
  model: str = 'harmonix-all',
  device: str = 'cuda' if torch.cuda.is_available() else 'cpu',
  include_activations: bool = False,
  include_embeddings: bool = False,
  demix_dir: PathLike = './demixed',
  spec_dir: PathLike = './spectrograms',
  keep_byproducts: bool = False,
): ...

Available Models

The models are trained on the Harmonix Set with 8-fold cross-validation. For more details, please refer to the paper.

harmonix-all: (Default) An ensemble model averaging the predictions of 8 models trained on each fold.
harmonix-foldN: A model trained on fold N (0~7). For example, harmonix-fold0 is trained on fold 0.

By default, the harmonix-all model is used. To use a different model, use the --model option:

allin1 --model harmonix-fold0 your_audio_file.wav

Speed

With an RTX 4090 GPU and Intel i9-10940X CPU (14 cores, 28 threads, 3.30 GHz), the harmonix-all model processed 10 songs (33 minutes) in 73 seconds.

Advanced Usage for Research

This package provides researchers with advanced options to extract frame-level raw activations and embeddings without post-processing. These have a resolution of 100 FPS, equivalent to 0.01 seconds per frame.

CLI

Activations

The --activ option also saves frame-level raw activations from sigmoid and softmax:

$ allin1 --activ your_audio_file.wav

You can find the activations in the .npz file:

./structures
└── your_audio_file1.json
└── your_audio_file1.activ.npz

To load the activations in Python:

>>> import numpy as np
>>> activ = np.load('./structures/your_audio_file1.activ.npz')
>>> activ.files
['beat', 'downbeat', 'segment', 'label']
>>> beat_activations = activ['beat']
>>> downbeat_activations = activ['downbeat']
>>> segment_boundary_activations = activ['segment']
>>> segment_label_activations = activ['label']

Details of the activations are as follows:

beat: Raw activations from the sigmoid layer for beat tracking (shape: [time_steps])
downbeat: Raw activations from the sigmoid layer for downbeat tracking (shape: [time_steps])
segment: Raw activations from the sigmoid layer for segment boundary detection (shape: [time_steps])
label: Raw activations from the softmax layer for segment labeling (shape: [label_class=10, time_steps])

You can access the label names as follows:

>>> allin1.HARMONIX_LABELS
['start',
 'end',
 'intro',
 'outro',
 'break',
 'bridge',
 'inst',
 'solo',
 'verse',
 'chorus']

Embeddings

This package also provides an option to extract raw embeddings from the model.

$ allin1 --embed your_audio_file.wav

You can find the embeddings in the .npy file:

./structures
└── your_audio_file1.json
└── your_audio_file1.embed.npy

To load the embeddings in Python:

>>> import numpy as np
>>> embed = np.load('your_audio_file1.embed.npy')

Each model embeds for every source-separated stem per time step, resulting in embeddings shaped as [stems=4, time_steps, embedding_size=24]:

The number of source-separated stems (the order is bass, drums, other, vocals).
The number of time steps (frames). The time step is 0.01 seconds (100 FPS).
The embedding size of 24.

Using the --embed option with the harmonix-all ensemble model will stack the embeddings, saving them with the shape [stems=4, time_steps, embedding_size=24, models=8].

Python

The Python API allin1.analyze() offers the same options as the CLI:

>>> allin1.analyze(
      paths='your_audio_file.wav',
      include_activations=True,
      include_embeddings=True,
    )

AnalysisResult(
  path='/path/to/your_audio_file.wav', 
  bpm=100, 
  beats=[...],
  downbeats=[...],
  segments=[...],
  activations={
    'beat': array(...), 
    'downbeat': array(...), 
    'segment': array(...), 
    'label': array(...)
  }, 
  embeddings=array(...),
)

Concerning MP3 Files

Due to variations in decoders, MP3 files can have slight offset differences. I recommend you to first convert your audio files to WAV format using FFmpeg (as shown below), and use the WAV files for all your data processing pipelines.

ffmpeg -i your_audio_file.mp3 your_audio_file.wav

In this package, audio files are read using Demucs. To my understanding, Demucs converts MP3 files to WAV using FFmpeg before reading them. However, using a different MP3 decoder can yield different offsets. I've observed variations of about 20~40ms, which is problematic for tasks requiring precise timing like beat tracking, where the conventional tolerance is just 70ms. Hence, I advise standardizing inputs to the WAV format for all data processing, ensuring straightforward decoding.

Citation

If you use this package for your research, please cite the following paper:

@inproceedings{taejun2023allinone,
  title={All-In-One Metrical And Functional Structure Analysis With Neighborhood Attentions on Demixed Audio},
  author={Kim, Taejun and Nam, Juhan},
  booktitle={IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  year={2023}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

Oct 10, 2023

1.0.3

Oct 5, 2023

1.0.2

Oct 5, 2023

1.0.1

Oct 4, 2023

1.0.0

Sep 4, 2023

0.4.6

Sep 3, 2023

0.4.5

Sep 3, 2023

0.4.4

Sep 3, 2023

0.4.3

Sep 3, 2023

0.4.2

Sep 3, 2023

0.4.1

Sep 3, 2023

0.4.0

Sep 2, 2023

0.3.0

Aug 30, 2023

This version

0.2.1

Aug 29, 2023

0.2

Aug 29, 2023

0.1.0

Aug 14, 2023

0.0.13

Aug 14, 2023

0.0.12

Aug 10, 2023

0.0.11

Aug 10, 2023

0.0.10

Aug 10, 2023

0.0.9

Aug 9, 2023

0.0.8

Aug 9, 2023

0.0.7

Aug 9, 2023

0.0.6

Aug 9, 2023

0.0.5

Aug 9, 2023

0.0.4

Aug 9, 2023

0.0.3

Aug 9, 2023

0.0.2

Aug 9, 2023

0.0.1

Aug 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allin1-0.2.1.tar.gz (21.5 kB view details)

Uploaded Aug 29, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

allin1-0.2.1-py3-none-any.whl (26.0 kB view details)

Uploaded Aug 29, 2023 Python 3

File details

Details for the file allin1-0.2.1.tar.gz.

File metadata

Download URL: allin1-0.2.1.tar.gz
Upload date: Aug 29, 2023
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.24.1

File hashes

Hashes for allin1-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`fca143e5fc84753ae3d65ea6bbe1c0cd95e2064ef2e1b1e2214b93aa00d67b2e`
MD5	`a73d65b3bf4303082c89c73c305d814d`
BLAKE2b-256	`728cbd28e2d0dcb08bde280e759500136009239a9399f839d333df3763bf31ca`

See more details on using hashes here.

File details

Details for the file allin1-0.2.1-py3-none-any.whl.

File metadata

Download URL: allin1-0.2.1-py3-none-any.whl
Upload date: Aug 29, 2023
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.24.1

File hashes

Hashes for allin1-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e8f42991956c92f534524e2945fa7f4b154f935b960c0599d3ad5380cbaa9f55`
MD5	`4440196894bf4f520d9464f3ada2493f`
BLAKE2b-256	`2767e8e1b1c79a7290be889cbfbb396777e703b3d7ef96872989a49450b912c7`

See more details on using hashes here.

allin1 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

All-In-One Music Structure Analyzer

Installation

1. Install PyTorch

2. Install NATTEN (Required for Linux and Windows; macOS will auto-install)

3. Install the package

4. (Optional) Install FFmpeg for MP3 support

Usage

CLI

Python

Available Models

Speed

Advanced Usage for Research

CLI

Activations

Embeddings

Python

Concerning MP3 Files

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes