Skip to main content

Metrics for Musical Structure

Project description


license: apache-2.0 language:

  • en tags:
  • music
  • audio
  • structure

Structural Derivation: A Model & Metric for Musical Structure Analysis

This repository contains a model for learning structural embeddings of music by contrasting segments from the same song against segments from different songs. It can be used to generate two novel metrics for any given audio file: the Structural Derivation Consistency (SDC) score and the Structural Diversity score.

These metrics are particularly useful for quantifying the high-level thematic structural consistency and structural diversity of a musical piece. Together, these metrics provide a lens on both the cohesion and variation of a track’s structure, distinguishing human-composed music (typically higher consistency and balanced diversity) from AI-generated music (often lower consistency or uneven diversity due to thematic drifts).


🎵 Key Metrics

The model analyzes a song by first splitting it into segments (e.g., 10-20 seconds each) and generating a high-level embedding for each segment. These embeddings are then used to compute the following scores.

1. Structural Derivation Consistency (SDC) Score

The SDC score quantifies a song's structural and thematic integrity. It serves as a proxy for musical narrative consistency, measuring how well a song "remembers" and develops the musical ideas presented in its opening segment. In other words, it reflects how structurally coherent and thematically consistent the song is in embedding space.

  • How it works: The score is the average cosine similarity between the embedding of the first segment (S1) and every subsequent segment (S2, S3, ..., SN).
  • Interpretation:
    • A high SDC score suggests the piece is thematically consistent. Like a well-written story, it builds upon its initial premise. This is characteristic of human-composed music, which often uses repetition, variation, and development of a core motif or theme.
    • A low SDC score indicates potential "thematic drift." The song's later sections diverge significantly from its beginning, lacking a clear connection to the initial musical statement. This can be a trait of AI-generated music that struggles with long-form compositional structure.

2. Structural Diversity Score

The Structural Diversity score measures the variety of distinct musical ideas within a single piece of music. It complements the SDC score by providing insight into the song's complexity and repetitiveness. In other words, it reflects the variety and richness of internal structures by capturing how different the segments are from one another.

  • How it works: The score is calculated using the Vendi Score on the similarity matrix derived from all segment embeddings.
  • Interpretation:
    • A high Diversity score indicates that the song contains a wide range of different-sounding segments. The piece is musically rich and varied.
    • A low Diversity score suggests the song is repetitive or monotonous, with very similar-sounding segments throughout.

A compelling musical piece might exhibit both high SDC (it's coherent) and high Diversity (it's interesting and not overly repetitive).


💻 Installation & Usage

1. Installation

Install the package and its dependencies directly from the GitHub repository:

pip install structural-derivation

2. Example Usage

The following script loads the model from Hugging Face, processes a list of audio files, and prints the resulting scores. The process_audio_files function handles splitting the audio, batching, and computing both metrics.

import torch
from structure_derivation.inference.batch_inference import process_audio_files
from structure_derivation.model.model import StructureDerivationModel

# 1. Set up the device and load the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# The model will be automatically downloaded from Hugging Face
model = StructureDerivationModel.from_pretrained(
    "keshavbhandari/structure-derivation"
)
model.to(device)
model.eval()

# 2. Define the list of audio files you want to analyze
my_audio_files = [
    "path/to/your/song1.mp3",
    "path/to/your/song2.wav",
    "path/to/another/track.flac",
]

# 3. Analyze the files to get the scores
# This function returns a dictionary with file paths as keys
results = process_audio_files(audio_paths=my_audio_files, model=model, batch_size=128, segment_seconds=10, target_sr=32000)

# 4. Print the results
for path, res in results.items():
    print(f"\nResults for {path}:")
    print("Cosine similarities with S1:", res["similarities"])
    print("Average Structure Derivation:", res["avg_structure_derivation"])
    print("Vendi score:", res["vendi_score"])

3. Model Architecture & Training

Architecture The core of this model is the Hierarchical Token-Semantic Audio Transformer (HTS-AT) encoder. This architecture, introduced by Chen et al., is a highly efficient and powerful audio transformer.

Key features of HTS-AT include:

  • Hierarchical Structure: It uses Patch-Merge layers to gradually reduce the number of audio tokens as they pass through the network. This significantly reduces GPU memory consumption and training time compared to standard transformers.
  • Windowed Attention: Instead of computing self-attention across the entire audio spectrogram, it uses a more efficient windowed attention mechanism from the Swin Transformer. This captures local relationships effectively while remaining computationally tractable.
  • High Performance: HTS-AT achieves state-of-the-art results on major audio classification benchmarks like AudioSet and ESC-50.

This architecture is ideal for our purpose as it can efficiently process long audio clips and learn rich, meaningful representations of musical structure.

Training The model was trained using a contrastive learning objective on a large-scale corpus of nearly 400,000 music files (each longer than one minute) for 20 epochs. The training data was sourced from a combination of the following publicly available datasets:

  • GiantSteps Key Dataset

  • FSL10K

  • Emotify

  • Emopia

  • ACM MIRUM

  • JamendoMaxCaps

  • MTG-Jamendo

  • SongEval

  • Song Describer

  • ISMIR04

  • Hainsworth

  • Saraga

  • DEAM

  • MAESTRO v3.0.0

4. Re-training on Custom Data

If you wish to re-train the model on your own dataset, you can use the provided training script. Run the following command from the root of the repository, adjusting the GPU devices as needed:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 torchrun --nproc_per_node=6 --master_port=12345 structure_derivation/train/train.py

This will start the training process using the default hyperparameters specified in the script. You can modify these parameters as needed to suit your dataset and training requirements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

structure_derivation-0.1.0.post1.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

structure_derivation-0.1.0.post1-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file structure_derivation-0.1.0.post1.tar.gz.

File metadata

File hashes

Hashes for structure_derivation-0.1.0.post1.tar.gz
Algorithm Hash digest
SHA256 45266b1a12c24a99bcf5b01af5550efac44f02351a9127a33e66aa4d2bb7f31d
MD5 926592ed32cdf9cf2f211c2e19b1c69f
BLAKE2b-256 6c11c86c8e98d95f944b7ed91b191ff76918c16bba4f51c6fbb53e1ee9d43ded

See more details on using hashes here.

File details

Details for the file structure_derivation-0.1.0.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for structure_derivation-0.1.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 96ee884cc1bd6f7064934c986fe8f3375a8758837976abfc368466c94d8b7742
MD5 f228ef0e3af93813e9d8815d76af2749
BLAKE2b-256 6194ef82065fd63b50d49ad4b2c58b5672e7e30a111a529ecf267e3872a6b565

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page