Skip to main content

Music structural segmentation for the Zigify pipeline (MSAF olda + scluster)

Project description

zigify-msaf

Music structural segmentation for the Zigify pipeline. A thin CLI wrapper around MSAF that pins to the olda boundary detector and scluster labeler — the combination that scored best in evaluation against hand-annotated ground truth.

Install

uvx zigify-msaf <audio>          # ephemeral, recommended
uv pip install zigify-msaf       # into a project

uvx resolves and caches a dedicated environment on first run; subsequent calls cold-start in ~100 ms.

Use

zigify-msaf path/to/track.mp3
zigify-msaf track.mp3 --out track.segments.json
zigify-msaf track.mp3 --feature mfcc --verbose
zigify-msaf track.mp3 --bpm 118        # skip detection, lock all segments to a known tempo
zigify-msaf track.mp3 --viz             # write track.png + track.html viewer next to the mp3
zigify-msaf track.mp3 --min-segment 5  # absorb segments shorter than 5s (default: 3.0; 0 disables)

stdout is newline-delimited JSON: progress events first, then a single final result line. stderr carries human-readable logs from msaf/librosa (silenced by default; pass --verbose to surface them).

Output schema

Each line on stdout is one JSON object. The shape of the final result line:

{
  "type": "result",
  "source": "path/to/track.mp3",
  "duration": 357.98,
  "tempo": 117.45,
  "tempoPrior": 117.45,
  "bpmOverride": null,
  "beatCount": 700,
  "accentCount": 88,
  "feature": "pcp",
  "boundaryAlgo": "olda",
  "labelAlgo": "scluster",
  "nSegments": 12,
  "nClusters": 5,
  "loudness": -18.4,
  "peakLoudness": -3.1,
  "segments": [
    {
      "start": 0.0,
      "end": 18.1,
      "duration": 18.1,
      "cluster": "S4",
      "bpm": 117.62,
      "beats": [0.51, 1.02, 1.53, 2.04],
      "beatCount": 36,
      "accents": [0.51, 4.6, 9.2, 13.8],
      "accentCount": 4,
      "topAccent": 9.2,
      "onsetCount": 22,
      "onsetRate": 1.215,
      "loudness": -23.7,
      "peakLoudness": -8.9,
      "dynamicRange": 14.8,
      "energy": 0.21,
      "brightness": 1840.5
    }
  ],
  "elapsed": 12.3
}

Per-segment fields beyond start/end/cluster describe musical character useful for downstream light-show or visualization generation:

Field Meaning
bpm Tempo measured within this segment (median of clean beat gaps). Lets each section follow its own pulse — important for tracks that aren't perfectly grid-aligned or that change tempo at structure boundaries.
beats, beatCount Beat timestamps (s, absolute) inside the segment, tracked per-segment with a tempogram-derived local prior gated by a global sanity prior (snaps to nearest onset within ±¼ period).
accents, accentCount, topAccent Strong onsets (top-quartile of onset-strength envelope) — the "hits" to flash on. topAccent is the loudest one in the segment.
onsetCount, onsetRate All detected onsets and their density (events / sec) — distinguishes calm sections from busy ones.
loudness, peakLoudness, dynamicRange Mean / peak RMS in dBFS, and their difference.
energy 0..1 loudness normalized to the loudest segment in the track (peakLoudness − 30 dB ↦ 0, peakLoudness ↦ 1). Suitable for direct mapping to brightness/intensity.
brightness Mean spectral centroid in Hz — higher = brighter / more high-frequency content.

Earlier lines look like:

{"type":"stage","name":"loading","message":"reading track.mp3"}
{"type":"stage","name":"onsets","message":"computing onset strength envelope"}
{"type":"stage","name":"accents","message":"detecting onsets and accents"}
{"type":"stage","name":"energy","message":"computing loudness and brightness"}
{"type":"stage","name":"features","message":"extracting pcp"}
{"type":"stage","name":"boundaries","message":"olda"}
{"type":"stage","name":"labels","message":"scluster"}
{"type":"stage","name":"tempo","message":"estimating global bpm prior"}
{"type":"stage","name":"beats","message":"tracking beats per segment"}

On failure the tool emits a single {"type": "error", ...} line and exits non-zero.

Segment cleanup

MSAF (olda boundaries) often emits a sub-second leading segment and occasionally a tiny trailing or mid-track sliver. The CLI absorbs any segment shorter than --min-segment (default 3.0s) into a neighbor: leading slivers merge into the next segment, trailing into the previous, and mid-track ones into the same-cluster neighbor (or the longer of the two if neither matches). Pass --min-segment 0 to keep every raw boundary.

Beat-sheet visualization

Pass --viz to render a horizontal beat sheet alongside the JSON output: waveform with RMS overlay, segment bands colored by cluster (with per-segment BPM), beat ticks (dark) and accent ticks (red). Useful for sanity-checking that detected beats actually line up with the audio.

Two files are written next to the audio file, sharing its basename:

  • track.png — the beat sheet itself.
  • track.html — a self-contained HTML viewer with a native <audio> element, the PNG below it, and a vertical playhead cursor that tracks currentTime. Click anywhere on the strip to seek. Open it directly in a browser.

Requires the viz extra:

uv pip install "zigify-msaf[viz]"
uvx --with matplotlib zigify-msaf track.mp3 --viz

Calling from Node / TypeScript

import { spawn } from 'node:child_process'
import { createInterface } from 'node:readline'

const proc = spawn('uvx', ['zigify-msaf', audioPath], { stdio: ['ignore', 'pipe', 'pipe'] })
const rl = createInterface({ input: proc.stdout })

let result: SegmentResult | undefined
for await (const line of rl) {
  const evt = JSON.parse(line)
  if (evt.type === 'stage') console.log(`[${evt.name}] ${evt.message ?? ''}`)
  if (evt.type === 'result') result = evt
  if (evt.type === 'error') throw new Error(evt.message)
}

Algorithm choice

Evaluated against a 14-boundary ground truth on Michael Jackson — Thriller (tolerance ±5 s):

Boundary algo Hits Miss Spurious
olda 9 4 2
foote 9 4 6
sf (default) 6 7 5
cnmf 5 8 2
scluster 4 9 10
vmo 12 1 457

olda wins precision and ties for recall. Scluster labels group the segments into ~5 clusters that align with verse/chorus/outro structure on test tracks.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zigify_msaf-0.3.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zigify_msaf-0.3.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file zigify_msaf-0.3.0.tar.gz.

File metadata

  • Download URL: zigify_msaf-0.3.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zigify_msaf-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b6257eb872050013e5a9affbfc8ac1a3018f22958eff07d369bedd39123cef24
MD5 c425a9778765e16a5ef7a7e6189078d3
BLAKE2b-256 e7650cb8773fc6040529328c6e2517f4e75523031db13827c16fc39ab47d578c

See more details on using hashes here.

File details

Details for the file zigify_msaf-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: zigify_msaf-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for zigify_msaf-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90fa9dfdda57152bec3d85e5bbe4ff22c0738bac6a7030513d4ef07615719817
MD5 d7caf08da7d8e3239772882cbf33a211
BLAKE2b-256 e6e159ef5bc869a437fcd187501e29b1294ac92570926e6b18ed256b8d4a9efb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page