Skip to main content

Objective evaluation metrics for Symbolic Music Generation

Project description

smg-metrics

Symbolic Music Generation Metrics — 52 objective evaluation metrics, zero config.

Python 3.10+ License: MIT PyPI version

8 categories, 52 metrics, 21 papers/projects (1990–2026), fully typed & tested.

Category Count Latest source Year
A. Single-file Quality 13 FGG / MusPy / XMusic 2025
B. Note-level Pairwise 5 Ou et al. 2025
C. Bar-level Pairwise 2 MuseMorphose 2023
D. Chord-level Pairwise 2 FGG / Wang et al. ISMIR 2025/2020
E. Distribution-level 5 SongMASS 2020
F. Advanced 14 Text2midi 2025
G. Structural 4 MuseTok 2026
H. Rhythmic/Temporal 7 Standard MIR Features / D3PIA Various

Quick Start

pip install smg-metrics
from smg_metrics import single_file, single_file_rhythmic, pair_eval, compute_ook

# Single-file quality (12 MusPy metrics)
quality = single_file("generated.mid")
print(quality.pce, quality.ebr, quality.sc)

# Out-of-Key fraction (FGG 2025)
ook = compute_ook("generated.mid")
print(f"OOK: {ook:.4f}")

# Rhythmic metrics (4 D3PIA-style)
rhythm = single_file_rhythmic("generated.mid")
print(rhythm.mean_ioi, rhythm.rhythmic_density)

# Pairwise comparison (11 metrics including deep chord similarity)
pair = pair_eval("generated.mid", "reference.mid")
print(pair.note_f1, pair.sim_chr, pair.ca, pair.cs)
# CLI
smg-eval -m generated.mid
smg-eval -p gen.mid -r ref.mid
smg-eval -m gen.mid -p gen.mid -r ref.mid -d -a -S -R
smg-eval -m gen.mid --only pce ebr sc --json

Installation

pip install smg-metrics

# Optional: Install torch for deep chord similarity (CS metric)
pip install smg-metrics[torch]
# Or: pip install torch>=2.0.0

# Or install from source:
git clone https://github.com/OlyMarco/smg_metric.git
cd smg_metric && pip install -e .
Package Version Purpose
muspy >= 0.5.0 13 single-file quality metrics
miditoolkit >= 1.0 MIDI parsing
pretty-midi >= 0.2.10 Beat tracking & bar-level parsing
mir-eval >= 0.7 Note-overlap metric
torch >= 2.0.0 Optional: Deep chord similarity (CS metric)
numpy >= 1.24 Numerical computation
scipy >= 1.10 Scientific computing

Python API

from smg_metrics import (
    single_file,                # 13 MusPy quality metrics
    single_file_structural,     # 2 structural single-file metrics
    single_file_rhythmic,       # 5 rhythmic metrics (4 D3PIA + GS)
    pair_eval,                  # 11 core pairwise metrics (incl. CS)
    pair_eval_structural,       # 2 structural pairwise metrics
    distribution_eval,          # 5 distribution-level metrics
    advanced_eval,              # 14 advanced metrics
    compute_ook,                # Out-of-Key fraction (FGG 2025)
    compute_cs,                 # Chord Similarity (Wang et al. 2020)
)
Container Fields Count
SingleFileResult pce, ebr, sc, pisr, polyphony, polyphony_rate, pitch_range, n_pitches_used, n_pitch_classes_used, emr, pe, dpc, ook 13
StructuralSingleResult che, ngram_div 2
RhythmicResult mean_ioi, rhythmic_intensity, rhythmic_density, voice_number, gs 5
PairResult note_f1, notei_f1, mel_f1, i_iou, ver, sim_chr, sim_grv, ca, cs, onset_xor, note_overlap 11
StructuralPairResult melody_match, tonal_dist 2
DistributionResult pd, dd, sc_sim, pce_sim, gsc 5
AdvancedResult kl_duration, kl_ioi, kl_pitch, oa_duration, oa_ioi, oa_pitch_range, oa_density, ci_precision, ci_recall, ci_f1, cts, cr_pred, cr_ref, recon_acc 14

Every result container is a frozen dataclass with .to_dict().

Chord Recognition

from smg_metrics import recognize_chords, compute_ca, compute_cs

# Advanced DP-based chord recognition (17 qualities + inversions)
chords = recognize_chords("music.mid")
for interval in chords:
    print(f"{interval.start:.2f}s - {interval.end:.2f}s: {interval.label}")
# Output: 0.00s - 2.50s: C:maj
#         2.50s - 5.00s: G:7/5  (second inversion)

# Chord Accuracy (rule-based DP method, FGG 2025)
ca = compute_ca("generated.mid", "reference.mid")
print(f"Chord Accuracy: {ca:.2%}")

# Chord Similarity (deep embedding, Wang et al. 2020)
cs = compute_cs("generated.mid", "reference.mid")
print(f"Chord Similarity: {cs:.4f}")

Note: CS metric requires downloading model weights (29 MB). See Model Weights section below.

Out-of-Key Notes

from smg_metrics import compute_ook

# Compute percentage of 16th-note steps with out-of-key notes
ook = compute_ook("generated.mid")
print(f"Out-of-Key: {ook:.4f}")

# Get detailed breakdown
ook, details = compute_ook("generated.mid", return_details=True)
print(f"Key: {details['key']}")
print(f"OOK steps: {details['ook_steps']}/{details['total_steps']}")
print(f"OOK notes: {details['ook_notes']}")

Reference: FGG (Zhu et al., ICML 2025) uses OOK to measure dissonance. Well-controlled generation should have OOK ≈ 0–0.02.

Model Weights

The Chord Similarity (CS) metric requires pretrained model weights:

Quick Download

# Download lightweight model (29 MB, recommended)
cd smg_metrics/model_weights
https://github.com/OlyMarco/smg_metric/blob/main/smg_metrics/model_weights/polydis-v1-chd_encoder_only.pt

Model Details

  • Architecture: Bidirectional GRU chord encoder (36 → 1024 → 256)
  • Training: EC2-VAE (Wang et al., ISMIR 2020)
  • Size: 29 MB (pruned from 104 MB full model, 72.3% reduction)
  • License: Inherits from original PolyDisVAE

Citation:

@inproceedings{wang2020learning,
  title={Learning interpretable representation for controllable polyphonic music generation},
  author={Wang, Ziyu and Wang, Dingsu and Zhang, Yixiao and Xia, Gus},
  booktitle={Proceedings of the 21st International Society for Music Information Retrieval Conference},
  year={2020}
}

See smg_metrics/model_weights/README.md for more details.

Chord Recognition

from smg_metrics import recognize_chords, compute_ca

# Beat-level DP chord recognition (music-x-lab algorithm)
chords = recognize_chords("song.mid")
for iv in chords:
    print(f"{iv.start:.2f}-{iv.end:.2f}: {iv.label}")

# Chord Accuracy with DP method (default)
ca = compute_ca("pred.mid", "ref.mid", method="dp")

# Or use Viterbi method (GETMusic)
ca = compute_ca("pred.mid", "ref.mid", method="viterbi")

Individual metrics

from smg_metrics import (
    chord_histogram_entropy, ngram_diversity,
    melody_matchness, tonal_distance,
    compute_ca, midi_to_chords, midi_to_chords_dp,
    mean_ioi, rhythmic_intensity, rhythmic_density,
    voice_number, onset_xor_distance, note_overlap,
    grooving_pattern_similarity,
)

che = chord_histogram_entropy("file.mid")
ca = compute_ca("pred.mid", "ref.mid")
gs = grooving_pattern_similarity("pred.mid", "ref.mid")

Test Suite

# Quick single-file test
python test.py --single-only data/gt/seg_40_48.mid

# Full test on directories (auto multi-core + progress bar when >= 2 files)
python test.py data/gen/ data/gt/

# Pairwise only
python test.py --pair-only pred.mid ref.mid

# Select specific metrics
python test.py --only pce ebr note_f1 ca sim_chr kl_pitch pred.mid ref.mid

# Save results to JSON
python test.py data/gen/ data/gt/ --json
Flag Description
--single-only Run single-file metrics only
--pair-only Run pairwise metrics only
--only METRIC ... Run only selected metrics
--json Save results to test_results.json

Notes:

  • When evaluating 2 or more files, test.py uses multi-core evaluation with a tqdm progress bar.
  • Output file: test_results.json in the project root.

CLI Usage

# Single-file quality (14 metrics: 13 MusPy + OOK)
smg-eval -m generated.mid

# Single-file + structural + rhythmic (20 metrics)
smg-eval -m generated.mid -S -R

# Pairwise core (11 metrics: includes CS with deep embedding)
smg-eval -p gen.mid -r ref.mid

# Full 52-metric run
smg-eval -m gen.mid -p gen.mid -r ref.mid -d -a -S -R

# Select specific metrics
smg-eval -m gen.mid --only pce ebr sc ook
smg-eval -p gen.mid -r ref.mid --only ca cs note_f1

# List all available metrics
smg-eval --list-metrics

# JSON output
smg-eval -m gen.mid --json

# Batch directory
smg-eval --pred_dir ./pred/ --ref_dir ./ref/

# Timing
smg-eval -m gen.mid --time
Flag Description Default
-m, --music PATH Single-file mode --
-p, --pred PATH Predicted MIDI for pair mode --
-r, --ref PATH Reference MIDI for pair mode --
--pred_dir DIR Batch predicted directory --
--ref_dir DIR Batch reference directory --
--root INT Root pitch for PISR 0
--mode {major,minor} Scale mode for PISR major
-d, --dist Distribution-level metrics false
-a, --advanced Advanced metrics false
-S, --structural Structural metrics false
-R, --rhythmic Rhythmic metrics false
--only METRIC ... Select specific metrics --
--list-metrics List all metric names --
--json JSON output false
--time Print elapsed time false

Metrics Reference

A. Single-file Quality (13)

Sources: MusPy / ISMIR 2020, XMusic / IEEE 2025, FGG ICML 2025.

Metric Symbol Range Reference
Pitch Class Entropy PCE [0, log₂12] Wu & Yang, ISMIR 2020; XMusic 2025
Empty Beat Rate EBR [0, 1] Dong et al., ISMIR 2018; XMusic 2025
Scale Consistency SC [0, 1] Mogren, NeurIPS-W 2016
Pitch-in-Scale Rate PISR [0, 1] Dong et al., AAAI 2018
Polyphony Poly [1, ∞) Dong et al., AAAI 2018
Polyphony Rate PR [0, 1] Dong et al., AAAI 2018
Pitch Range Range [0, 127] MusPy 2020
Unique Pitches N_p [0, 128] MusPy 2020
Unique Pitch Classes N_pc [0, 12] MusPy 2020
Empty Measure Rate EMR [0, 1] Dong et al., AAAI 2018
Pitch Entropy PE [0, 7] MusPy 2020
Drum Pattern Consistency DPC [0, 1] Dong et al., AAAI 2018
Out-of-Key Fraction OOK [0, 1] FGG 2025, Krumhansl-Kessler key detection

B. Note-level Pairwise (5)

Source: Ou et al., NeurIPS 2025.

Metric Symbol Range
Note F1 F1 [0, 1]
Notei F1 F1i [0, 1]
Melody F1 F1mel [0, 1]
Instrument IoU I-IoU [0, 1]
Voice Error Rate VER [0, ∞)

B2. Pairwise Rhythmic (2)

Sources: Standard MIR rhythmic comparison metrics. XOR implementation from D3PIA ICASSP 2026, NOvlp from mir_eval ISMIR 2014.

Metric Symbol Range
Onset XOR Distance XOR [0, 1]
Note Overlap NOvlp [0, 1]

C. Bar-level Pairwise (2)

Source: MuseMorphose, IEEE/ACM TASLP 2023.

Metric Symbol Range
Chroma Similarity simChr [0, 1]
Groove Similarity simGrv [0, 1]

D. Chord-level Pairwise (2)

Sources: FGG ICML 2025, Wang et al. ISMIR 2020, music-x-lab/midi-chord-recognition.

Metric Symbol Range Description
Chord Accuracy CA [0, 1] Beat-level DP chord recognition + exact match
Chord Similarity CS [0, 1] Deep chord embedding similarity (requires PyTorch)

Chord Recognition Pipeline (adapted from music-x-lab):

  1. Extract beat/downbeat positions from MIDI tempo map
  2. Quantise notes to beat grid → per-beat 12-dim treble chroma + bass chroma
  3. Channel-weighted aggregation (thickness + bass reweighting)
  4. Score each chord template per beat (with bass bonus)
  5. Dynamic-programming decode with span-length reward and transition penalty
  6. Output interval-level chord labels

Two methods available: 'dp' (default, beat-level) and 'viterbi' (bar-level HMM).

E. Distribution-level (5)

Sources: SongMASS ACM-MM 2020, MusPy ISMIR 2020, Wu & Yang ISMIR 2020.

Metric Symbol Range
Pitch Distribution PD [0, 1]
Duration Distribution DD [0, 1]
Scale Consistency Sim SC_sim [0, 1]
Pitch Class Entropy Sim PCE_sim [0, 1]
Groove Pattern Similarity Consistency GSC [0, 1]

F. Advanced (14)

Sources: GETMusic IJCAI 2025, Text2midi AAAI 2025, MuseTok ICASSP 2026.

Metric Symbol Range
KL Divergence (Duration) KL_dur [0, ∞)
KL Divergence (IOI) KL_ioi [0, ∞)
KL Divergence (Pitch) KL_pitch [0, ∞)
Overlapping Area ×4 OA [0, 1]
Instrument Coverage ×3 CI [0, 1]
Correct Time Signature CTS {0, 1}
Compression Ratio ×2 CR [0, ∞)
Reconstruction Accuracy ReconAcc [0, 1]

G. Structural (4)

Sources: Papadopoulos & Peeters ISMIR 2012, Yang & Lerch NCA 2018, Mongeau & Sankoff CH 1990, Harte et al. ACM MM 2006.

Metric Symbol Type Range
Chord Histogram Entropy CHE single [0, log₂C]
N-gram Diversity Ngram single [0, 1]
Melody Matchness MM pair [0, 1]
Tonal Distance TD pair [0, ∞)

H. Rhythmic/Temporal Single-file (5)

Sources: Standard MIR rhythmic features. Implementation conventions from D3PIA/MIDISym (ICASSP 2026), Wu & Yang ISMIR 2020.

Metric Symbol Range
Mean IOI IOI [0, ∞)
Rhythmic Intensity RI [0, ∞)
Rhythmic Density RD [0, 1]
Voice Number VN [0, ∞)
Grooving Pattern Similarity GS [0, 1]

v5.3 Changelog

Major Refactoring

  • GS Metric Correction: Grooving Pattern Similarity (GS) reimplemented following original paper definition

    • Now uses 64-dimensional binary onset vectors per bar (as per Wu & Yang ISMIR 2020)
    • Computes normalized Hamming similarity between all bar pairs: GS = 1 - (1/64) · Σ XOR
    • Removed dependency on muspy.groove_consistency() (incorrect implementation)
    • Range: [0, 1] — measures rhythmic pattern consistency within a piece
    • RhythmicResult now contains 5 metrics: IOI, RI, RD, VN, GS
  • GSC Update: Distribution-level metric now uses corrected GS implementation

    • Updated _gsc() in distribution.py to use new grooving_pattern_similarity()
    • Definition unchanged: GSC = 1 - |GS_pred - GS_ref|
    • Clearer distinction from single-file GS metric

Performance Optimization

  • CLI Startup Speed: Implemented lazy imports via __getattr__ in __init__.py
    • Startup time reduced from ~6.3s to ~0.14s (~45× faster)
    • --help now responds instantly
    • API remains fully compatible: from smg_metrics import single_file still works

Code Changes

  • Reimplemented grooving_pattern_similarity() in rhythmic.py with paper-accurate algorithm
  • Updated _gsc() in distribution.py to use corrected GS implementation
  • Removed muspy dependency from GS calculation
  • Added comprehensive docstrings with paper references and formulas

Documentation Updates

  • Updated metric counts: Single-file (14→13), Rhythmic single (4→5), Rhythmic pair (3→2)
  • Corrected Wu & Yang ISMIR 2020 paper citations and links
  • Added comprehensive source references for rhythmic metrics
  • Clarified metric categories, ranges, and purposes

API Changes

  • Modified: grooving_pattern_similarity(midi_path) — now uses 64-dim vectors, normalized Hamming similarity
  • Unchanged: gsc in DistributionResult (name unchanged from v5.2)
  • Removed: gs field from SingleFileResult
  • Added: gs field to RhythmicResult

Changed

  • Version: 5.2.0 → 5.3.0
  • Total metrics: 52 (reorganized for clarity and consistency)

v5.2 Changelog

Optimizations

  • CS Model Caching: Chord Similarity model now cached for batch evaluation
    • Model loaded once and reused across multiple compute_cs() calls
    • ~1.6× faster per call after initial load, ~39% time savings on large batches
    • Example: 153 file pairs reduced from 34.6s → 21.2s
    • Thread-safe caching with automatic device management
    • New API: clear_cs_model_cache() to manually free GPU/CPU memory after batch processing
  • Memory Management: Users can explicitly release CS model memory when needed

API Changes

  • New function: clear_cs_model_cache() — Clear cached CS models to free memory
  • Improved: compute_cs() now automatically caches model for subsequent calls

Changed

  • Version: 5.1.0 → 5.2.0
  • Performance: Batch CS evaluation significantly faster (model loaded once vs. per-call)

Example

from smg_metrics import compute_cs, clear_cs_model_cache

# Batch evaluation - model loaded once, reused for all pairs
for pred, ref in file_pairs:
    cs = compute_cs(pred, ref)  # Fast after first call
    
# Free memory after batch
clear_cs_model_cache()  # Returns: 1 (number of models cleared)

v5.1 Changelog

New features

  • Chord Similarity (CS): Deep chord embedding metric using pruned PolyDisVAE encoder (Wang et al. ISMIR 2020)
    • Optional dependency: requires torch>=2.0.0 (install with pip install smg-metrics[torch])
    • Lightweight model (29 MB): polydis-v1-chd_encoder_only.pt
    • Supports 17 chord qualities + inversions via dynamic programming recognition
  • Out-of-Key (OOK): Percentage of notes outside detected key using Krumhansl-Kessler algorithm
    • Standalone single-file metric, no external annotations needed
    • Integrated into CLI with --only ook support

Changed

  • PyTorch: Moved to optional dependencies (torch extra)
  • Metric count: 51 → 53 (added CS + OOK)
  • Version: 5.0.0 → 5.1.0

License

MIT — see LICENCE.

Citation

If you use smg-metrics in your research, please cite:

@software{smg_metrics,
  title  = {smg-metrics: Objective Evaluation Metrics for Symbolic Music Generation},
  author = {Temmie Pratt},
  year   = {2026},
  url    = {https://github.com/OlyMarco/smg_metric},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg_metrics-5.3.0.tar.gz (28.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smg_metrics-5.3.0-py3-none-any.whl (28.2 MB view details)

Uploaded Python 3

File details

Details for the file smg_metrics-5.3.0.tar.gz.

File metadata

  • Download URL: smg_metrics-5.3.0.tar.gz
  • Upload date:
  • Size: 28.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for smg_metrics-5.3.0.tar.gz
Algorithm Hash digest
SHA256 b673f318211920afd4e0e74fccea99b0c2f6d5d2afc071049726e2b94b8a2707
MD5 cd8f30b8447f8a135fb54fbdce5b67db
BLAKE2b-256 1e13f4648e4bff5d5014f6f526ab0a6f433f085399d24fdfa76bb7b9943dfb52

See more details on using hashes here.

File details

Details for the file smg_metrics-5.3.0-py3-none-any.whl.

File metadata

  • Download URL: smg_metrics-5.3.0-py3-none-any.whl
  • Upload date:
  • Size: 28.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for smg_metrics-5.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1aa4d9e583b5d6d7f9fc586f2e0d082852f559d1a824a47ff968041c871d89fc
MD5 584f88d68497875d5d0469181c85c07b
BLAKE2b-256 1a6258b986bc022a618a6a7491193355b2d16d877b1f3a82a2244fd2cd44fa0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page