Music structural segmentation for the Zigify pipeline (MSAF olda + scluster)
Project description
zigify-msaf
Music structural segmentation for the Zigify pipeline. A thin CLI wrapper around MSAF that pins to the olda boundary detector and scluster labeler — the combination that scored best in evaluation against hand-annotated ground truth.
Install
uvx zigify-msaf <audio> # ephemeral, recommended
uv pip install zigify-msaf # into a project
uvx resolves and caches a dedicated environment on first run; subsequent calls cold-start in ~100 ms.
Use
zigify-msaf path/to/track.mp3
zigify-msaf track.mp3 --out track.segments.json
zigify-msaf track.mp3 --feature mfcc --verbose
stdout is newline-delimited JSON: progress events first, then a single final result line. stderr carries human-readable logs from msaf/librosa (silenced by default; pass --verbose to surface them).
Output schema
Each line on stdout is one JSON object. The shape of the final result line:
{
"type": "result",
"source": "path/to/track.mp3",
"duration": 357.98,
"tempo": 117.45,
"beatCount": 700,
"accentCount": 88,
"feature": "pcp",
"boundaryAlgo": "olda",
"labelAlgo": "scluster",
"nSegments": 12,
"nClusters": 5,
"loudness": -18.4,
"peakLoudness": -3.1,
"segments": [
{
"start": 0.0,
"end": 18.1,
"duration": 18.1,
"cluster": "S4",
"beats": [0.51, 1.02, 1.53, 2.04],
"beatCount": 36,
"accents": [0.51, 4.6, 9.2, 13.8],
"accentCount": 4,
"topAccent": 9.2,
"onsetCount": 22,
"onsetRate": 1.215,
"loudness": -23.7,
"peakLoudness": -8.9,
"dynamicRange": 14.8,
"energy": 0.21,
"brightness": 1840.5
}
],
"elapsed": 12.3
}
Per-segment fields beyond start/end/cluster describe musical character useful for downstream light-show or visualization generation:
| Field | Meaning |
|---|---|
beats, beatCount |
Beat timestamps (s, absolute) inside the segment, from librosa.beat.beat_track. |
accents, accentCount, topAccent |
Strong onsets (top-quartile of onset-strength envelope) — the "hits" to flash on. topAccent is the loudest one in the segment. |
onsetCount, onsetRate |
All detected onsets and their density (events / sec) — distinguishes calm sections from busy ones. |
loudness, peakLoudness, dynamicRange |
Mean / peak RMS in dBFS, and their difference. |
energy |
0..1 loudness normalized to the loudest segment in the track (peakLoudness − 30 dB ↦ 0, peakLoudness ↦ 1). Suitable for direct mapping to brightness/intensity. |
brightness |
Mean spectral centroid in Hz — higher = brighter / more high-frequency content. |
Earlier lines look like:
{"type":"stage","name":"loading","message":"reading track.mp3"}
{"type":"stage","name":"tempo","message":"estimating bpm and beats"}
{"type":"stage","name":"accents","message":"detecting onsets and accents"}
{"type":"stage","name":"energy","message":"computing loudness and brightness"}
{"type":"stage","name":"features","message":"extracting pcp"}
{"type":"stage","name":"boundaries","message":"olda"}
{"type":"stage","name":"labels","message":"scluster"}
On failure the tool emits a single {"type": "error", ...} line and exits non-zero.
Calling from Node / TypeScript
import { spawn } from 'node:child_process'
import { createInterface } from 'node:readline'
const proc = spawn('uvx', ['zigify-msaf', audioPath], { stdio: ['ignore', 'pipe', 'pipe'] })
const rl = createInterface({ input: proc.stdout })
let result: SegmentResult | undefined
for await (const line of rl) {
const evt = JSON.parse(line)
if (evt.type === 'stage') console.log(`[${evt.name}] ${evt.message ?? ''}`)
if (evt.type === 'result') result = evt
if (evt.type === 'error') throw new Error(evt.message)
}
Algorithm choice
Evaluated against a 14-boundary ground truth on Michael Jackson — Thriller (tolerance ±5 s):
| Boundary algo | Hits | Miss | Spurious |
|---|---|---|---|
| olda | 9 | 4 | 2 |
| foote | 9 | 4 | 6 |
| sf (default) | 6 | 7 | 5 |
| cnmf | 5 | 8 | 2 |
| scluster | 4 | 9 | 10 |
| vmo | 12 | 1 | 457 |
olda wins precision and ties for recall. Scluster labels group the segments into ~5 clusters that align with verse/chorus/outro structure on test tracks.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zigify_msaf-0.2.0.tar.gz.
File metadata
- Download URL: zigify_msaf-0.2.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4afed628a33def5f20f78e8fd065b08b1f2e19656b5c0cbe0fcdbf15370e1c39
|
|
| MD5 |
5aca4b68c0b65802b3139ca6a5e01b74
|
|
| BLAKE2b-256 |
fad26601670a2d53b49074a62de6a42dbe2fec73fa71efa6337754d704417f38
|
File details
Details for the file zigify_msaf-0.2.0-py3-none-any.whl.
File metadata
- Download URL: zigify_msaf-0.2.0-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9eaade17668162fa64fad6167baf6942d03a75599321c1a399b31668230163d
|
|
| MD5 |
265d0978d1b996b9dded89428afb5e5a
|
|
| BLAKE2b-256 |
4b964dd2c13e3d649d3d65e3d7023a69ccdd8bc98c61bf161758f43493980fe9
|