Objective evaluation metrics for Symbolic Music Generation
Project description
smg-metrics
Symbolic Music Generation Metrics — 52 objective evaluation metrics, zero config.
8 categories, 52 metrics, 21 papers/projects (1990–2026), fully typed & tested.
| Category | Count | Latest source | Year |
|---|---|---|---|
| A. Single-file Quality | 13 | FGG / MusPy / XMusic | 2025 |
| B. Note-level Pairwise | 5 | Ou et al. | 2025 |
| C. Bar-level Pairwise | 2 | MuseMorphose | 2023 |
| D. Chord-level Pairwise | 2 | FGG / Wang et al. ISMIR | 2025/2020 |
| E. Distribution-level | 5 | SongMASS | 2020 |
| F. Advanced | 14 | Text2midi | 2025 |
| G. Structural | 4 | MuseTok | 2026 |
| H. Rhythmic/Temporal | 7 | Standard MIR Features / D3PIA | Various |
Quick Start
pip install smg-metrics
from smg_metrics import single_file, single_file_rhythmic, pair_eval, compute_ook
# Single-file quality (12 MusPy metrics)
quality = single_file("generated.mid")
print(quality.pce, quality.ebr, quality.sc)
# Out-of-Key fraction (FGG 2025)
ook = compute_ook("generated.mid")
print(f"OOK: {ook:.4f}")
# Rhythmic metrics (4 D3PIA-style)
rhythm = single_file_rhythmic("generated.mid")
print(rhythm.mean_ioi, rhythm.rhythmic_density)
# Pairwise comparison (11 metrics including deep chord similarity)
pair = pair_eval("generated.mid", "reference.mid")
print(pair.note_f1, pair.sim_chr, pair.ca, pair.cs)
# CLI
smg-eval -m generated.mid
smg-eval -p gen.mid -r ref.mid
smg-eval -m gen.mid -p gen.mid -r ref.mid -d -a -S -R
smg-eval -m gen.mid --only pce ebr sc --json
Installation
pip install smg-metrics
# Optional: Install torch for deep chord similarity (CS metric)
pip install smg-metrics[torch]
# Or: pip install torch>=2.0.0
# Or install from source:
git clone https://github.com/OlyMarco/smg_metric.git
cd smg_metric && pip install -e .
| Package | Version | Purpose |
|---|---|---|
muspy |
>= 0.5.0 | 13 single-file quality metrics |
miditoolkit |
>= 1.0 | MIDI parsing |
pretty-midi |
>= 0.2.10 | Beat tracking & bar-level parsing |
mir-eval |
>= 0.7 | Note-overlap metric |
torch |
>= 2.0.0 | Optional: Deep chord similarity (CS metric) |
numpy |
>= 1.24 | Numerical computation |
scipy |
>= 1.10 | Scientific computing |
Python API
from smg_metrics import (
single_file, # 13 MusPy quality metrics
single_file_structural, # 2 structural single-file metrics
single_file_rhythmic, # 5 rhythmic metrics (4 D3PIA + GS)
pair_eval, # 11 core pairwise metrics (incl. CS)
pair_eval_structural, # 2 structural pairwise metrics
distribution_eval, # 5 distribution-level metrics
advanced_eval, # 14 advanced metrics
compute_ook, # Out-of-Key fraction (FGG 2025)
compute_cs, # Chord Similarity (Wang et al. 2020)
)
| Container | Fields | Count |
|---|---|---|
SingleFileResult |
pce, ebr, sc, pisr, polyphony, polyphony_rate, pitch_range, n_pitches_used, n_pitch_classes_used, emr, pe, dpc, ook | 13 |
StructuralSingleResult |
che, ngram_div | 2 |
RhythmicResult |
mean_ioi, rhythmic_intensity, rhythmic_density, voice_number, gs | 5 |
PairResult |
note_f1, notei_f1, mel_f1, i_iou, ver, sim_chr, sim_grv, ca, cs, onset_xor, note_overlap | 11 |
StructuralPairResult |
melody_match, tonal_dist | 2 |
DistributionResult |
pd, dd, sc_sim, pce_sim, gsc | 5 |
AdvancedResult |
kl_duration, kl_ioi, kl_pitch, oa_duration, oa_ioi, oa_pitch_range, oa_density, ci_precision, ci_recall, ci_f1, cts, cr_pred, cr_ref, recon_acc | 14 |
Every result container is a frozen dataclass with .to_dict().
Chord Recognition
from smg_metrics import recognize_chords, compute_ca, compute_cs
# Advanced DP-based chord recognition (17 qualities + inversions)
chords = recognize_chords("music.mid")
for interval in chords:
print(f"{interval.start:.2f}s - {interval.end:.2f}s: {interval.label}")
# Output: 0.00s - 2.50s: C:maj
# 2.50s - 5.00s: G:7/5 (second inversion)
# Chord Accuracy (rule-based DP method, FGG 2025)
ca = compute_ca("generated.mid", "reference.mid")
print(f"Chord Accuracy: {ca:.2%}")
# Chord Similarity (deep embedding, Wang et al. 2020)
cs = compute_cs("generated.mid", "reference.mid")
print(f"Chord Similarity: {cs:.4f}")
Note: CS metric requires downloading model weights (29 MB). See Model Weights section below.
Out-of-Key Notes
from smg_metrics import compute_ook
# Compute percentage of 16th-note steps with out-of-key notes
ook = compute_ook("generated.mid")
print(f"Out-of-Key: {ook:.4f}")
# Get detailed breakdown
ook, details = compute_ook("generated.mid", return_details=True)
print(f"Key: {details['key']}")
print(f"OOK steps: {details['ook_steps']}/{details['total_steps']}")
print(f"OOK notes: {details['ook_notes']}")
Reference: FGG (Zhu et al., ICML 2025) uses OOK to measure dissonance. Well-controlled generation should have OOK ≈ 0–0.02.
Model Weights
The Chord Similarity (CS) metric requires pretrained model weights:
Quick Download
# Download lightweight model (29 MB, recommended)
cd smg_metrics/model_weights
https://github.com/OlyMarco/smg_metric/blob/main/smg_metrics/model_weights/polydis-v1-chd_encoder_only.pt
Model Details
- Architecture: Bidirectional GRU chord encoder (36 → 1024 → 256)
- Training: EC2-VAE (Wang et al., ISMIR 2020)
- Size: 29 MB (pruned from 104 MB full model, 72.3% reduction)
- License: Inherits from original PolyDisVAE
Citation:
@inproceedings{wang2020learning,
title={Learning interpretable representation for controllable polyphonic music generation},
author={Wang, Ziyu and Wang, Dingsu and Zhang, Yixiao and Xia, Gus},
booktitle={Proceedings of the 21st International Society for Music Information Retrieval Conference},
year={2020}
}
See smg_metrics/model_weights/README.md for more details.
Chord Recognition
from smg_metrics import recognize_chords, compute_ca
# Beat-level DP chord recognition (music-x-lab algorithm)
chords = recognize_chords("song.mid")
for iv in chords:
print(f"{iv.start:.2f}-{iv.end:.2f}: {iv.label}")
# Chord Accuracy with DP method (default)
ca = compute_ca("pred.mid", "ref.mid", method="dp")
# Or use Viterbi method (GETMusic)
ca = compute_ca("pred.mid", "ref.mid", method="viterbi")
Individual metrics
from smg_metrics import (
chord_histogram_entropy, ngram_diversity,
melody_matchness, tonal_distance,
compute_ca, midi_to_chords, midi_to_chords_dp,
mean_ioi, rhythmic_intensity, rhythmic_density,
voice_number, onset_xor_distance, note_overlap,
grooving_pattern_similarity,
)
che = chord_histogram_entropy("file.mid")
ca = compute_ca("pred.mid", "ref.mid")
gs = grooving_pattern_similarity("pred.mid", "ref.mid")
Test Suite
# Quick single-file test
python test.py --single-only data/gt/seg_40_48.mid
# Full test on directories (auto multi-core + progress bar when >= 2 files)
python test.py data/gen/ data/gt/
# Pairwise only
python test.py --pair-only pred.mid ref.mid
# Select specific metrics
python test.py --only pce ebr note_f1 ca sim_chr kl_pitch pred.mid ref.mid
# Save results to JSON
python test.py data/gen/ data/gt/ --json
| Flag | Description |
|---|---|
--single-only |
Run single-file metrics only |
--pair-only |
Run pairwise metrics only |
--only METRIC ... |
Run only selected metrics |
--json |
Save results to test_results.json |
Notes:
- When evaluating 2 or more files, test.py uses multi-core evaluation with a tqdm progress bar.
- Output file:
test_results.jsonin the project root.
CLI Usage
# Single-file quality (14 metrics: 13 MusPy + OOK)
smg-eval -m generated.mid
# Single-file + structural + rhythmic (20 metrics)
smg-eval -m generated.mid -S -R
# Pairwise core (11 metrics: includes CS with deep embedding)
smg-eval -p gen.mid -r ref.mid
# Full 52-metric run
smg-eval -m gen.mid -p gen.mid -r ref.mid -d -a -S -R
# Select specific metrics
smg-eval -m gen.mid --only pce ebr sc ook
smg-eval -p gen.mid -r ref.mid --only ca cs note_f1
# List all available metrics
smg-eval --list-metrics
# JSON output
smg-eval -m gen.mid --json
# Batch directory
smg-eval --pred_dir ./pred/ --ref_dir ./ref/
# Timing
smg-eval -m gen.mid --time
| Flag | Description | Default |
|---|---|---|
-m, --music PATH |
Single-file mode | -- |
-p, --pred PATH |
Predicted MIDI for pair mode | -- |
-r, --ref PATH |
Reference MIDI for pair mode | -- |
--pred_dir DIR |
Batch predicted directory | -- |
--ref_dir DIR |
Batch reference directory | -- |
--root INT |
Root pitch for PISR | 0 |
--mode {major,minor} |
Scale mode for PISR | major |
-d, --dist |
Distribution-level metrics | false |
-a, --advanced |
Advanced metrics | false |
-S, --structural |
Structural metrics | false |
-R, --rhythmic |
Rhythmic metrics | false |
--only METRIC ... |
Select specific metrics | -- |
--list-metrics |
List all metric names | -- |
--json |
JSON output | false |
--time |
Print elapsed time | false |
Metrics Reference
A. Single-file Quality (13)
Sources: MusPy / ISMIR 2020, XMusic / IEEE 2025, FGG ICML 2025.
| Metric | Symbol | Range | Reference |
|---|---|---|---|
| Pitch Class Entropy | PCE | [0, log₂12] | Wu & Yang, ISMIR 2020; XMusic 2025 |
| Empty Beat Rate | EBR | [0, 1] | Dong et al., ISMIR 2018; XMusic 2025 |
| Scale Consistency | SC | [0, 1] | Mogren, NeurIPS-W 2016 |
| Pitch-in-Scale Rate | PISR | [0, 1] | Dong et al., AAAI 2018 |
| Polyphony | Poly | [1, ∞) | Dong et al., AAAI 2018 |
| Polyphony Rate | PR | [0, 1] | Dong et al., AAAI 2018 |
| Pitch Range | Range | [0, 127] | MusPy 2020 |
| Unique Pitches | N_p | [0, 128] | MusPy 2020 |
| Unique Pitch Classes | N_pc | [0, 12] | MusPy 2020 |
| Empty Measure Rate | EMR | [0, 1] | Dong et al., AAAI 2018 |
| Pitch Entropy | PE | [0, 7] | MusPy 2020 |
| Drum Pattern Consistency | DPC | [0, 1] | Dong et al., AAAI 2018 |
| Out-of-Key Fraction | OOK | [0, 1] | FGG 2025, Krumhansl-Kessler key detection |
B. Note-level Pairwise (5)
Source: Ou et al., NeurIPS 2025.
| Metric | Symbol | Range |
|---|---|---|
| Note F1 | F1 | [0, 1] |
| Notei F1 | F1i | [0, 1] |
| Melody F1 | F1mel | [0, 1] |
| Instrument IoU | I-IoU | [0, 1] |
| Voice Error Rate | VER | [0, ∞) |
B2. Pairwise Rhythmic (2)
Sources: Standard MIR rhythmic comparison metrics. XOR implementation from D3PIA ICASSP 2026, NOvlp from mir_eval ISMIR 2014.
| Metric | Symbol | Range |
|---|---|---|
| Onset XOR Distance | XOR | [0, 1] |
| Note Overlap | NOvlp | [0, 1] |
C. Bar-level Pairwise (2)
Source: MuseMorphose, IEEE/ACM TASLP 2023.
| Metric | Symbol | Range |
|---|---|---|
| Chroma Similarity | simChr | [0, 1] |
| Groove Similarity | simGrv | [0, 1] |
D. Chord-level Pairwise (2)
Sources: FGG ICML 2025, Wang et al. ISMIR 2020, music-x-lab/midi-chord-recognition.
| Metric | Symbol | Range | Description |
|---|---|---|---|
| Chord Accuracy | CA | [0, 1] | Beat-level DP chord recognition + exact match |
| Chord Similarity | CS | [0, 1] | Deep chord embedding similarity (requires PyTorch) |
Chord Recognition Pipeline (adapted from music-x-lab):
- Extract beat/downbeat positions from MIDI tempo map
- Quantise notes to beat grid → per-beat 12-dim treble chroma + bass chroma
- Channel-weighted aggregation (thickness + bass reweighting)
- Score each chord template per beat (with bass bonus)
- Dynamic-programming decode with span-length reward and transition penalty
- Output interval-level chord labels
Two methods available: 'dp' (default, beat-level) and 'viterbi' (bar-level HMM).
E. Distribution-level (5)
Sources: SongMASS ACM-MM 2020, MusPy ISMIR 2020, Wu & Yang ISMIR 2020.
| Metric | Symbol | Range |
|---|---|---|
| Pitch Distribution | PD | [0, 1] |
| Duration Distribution | DD | [0, 1] |
| Scale Consistency Sim | SC_sim | [0, 1] |
| Pitch Class Entropy Sim | PCE_sim | [0, 1] |
| Groove Pattern Similarity Consistency | GSC | [0, 1] |
F. Advanced (14)
Sources: GETMusic IJCAI 2025, Text2midi AAAI 2025, MuseTok ICASSP 2026.
| Metric | Symbol | Range |
|---|---|---|
| KL Divergence (Duration) | KL_dur | [0, ∞) |
| KL Divergence (IOI) | KL_ioi | [0, ∞) |
| KL Divergence (Pitch) | KL_pitch | [0, ∞) |
| Overlapping Area ×4 | OA | [0, 1] |
| Instrument Coverage ×3 | CI | [0, 1] |
| Correct Time Signature | CTS | {0, 1} |
| Compression Ratio ×2 | CR | [0, ∞) |
| Reconstruction Accuracy | ReconAcc | [0, 1] |
G. Structural (4)
Sources: Papadopoulos & Peeters ISMIR 2012, Yang & Lerch NCA 2018, Mongeau & Sankoff CH 1990, Harte et al. ACM MM 2006.
| Metric | Symbol | Type | Range |
|---|---|---|---|
| Chord Histogram Entropy | CHE | single | [0, log₂C] |
| N-gram Diversity | Ngram | single | [0, 1] |
| Melody Matchness | MM | pair | [0, 1] |
| Tonal Distance | TD | pair | [0, ∞) |
H. Rhythmic/Temporal Single-file (5)
Sources: Standard MIR rhythmic features. Implementation conventions from D3PIA/MIDISym (ICASSP 2026), Wu & Yang ISMIR 2020.
| Metric | Symbol | Range |
|---|---|---|
| Mean IOI | IOI | [0, ∞) |
| Rhythmic Intensity | RI | [0, ∞) |
| Rhythmic Density | RD | [0, 1] |
| Voice Number | VN | [0, ∞) |
| Grooving Pattern Similarity | GS | [0, 1] |
v5.3 Changelog
Major Refactoring
-
GS Metric Correction: Grooving Pattern Similarity (GS) reimplemented following original paper definition
- Now uses 64-dimensional binary onset vectors per bar (as per Wu & Yang ISMIR 2020)
- Computes normalized Hamming similarity between all bar pairs: GS = 1 - (1/64) · Σ XOR
- Removed dependency on
muspy.groove_consistency()(incorrect implementation) - Range: [0, 1] — measures rhythmic pattern consistency within a piece
RhythmicResultnow contains 5 metrics: IOI, RI, RD, VN, GS
-
GSC Update: Distribution-level metric now uses corrected GS implementation
- Updated
_gsc()indistribution.pyto use newgrooving_pattern_similarity() - Definition unchanged: GSC = 1 - |GS_pred - GS_ref|
- Clearer distinction from single-file GS metric
- Updated
Performance Optimization
- CLI Startup Speed: Implemented lazy imports via
__getattr__in__init__.py- Startup time reduced from ~6.3s to ~0.14s (~45× faster)
--helpnow responds instantly- API remains fully compatible:
from smg_metrics import single_filestill works
Code Changes
- Reimplemented
grooving_pattern_similarity()inrhythmic.pywith paper-accurate algorithm - Updated
_gsc()indistribution.pyto use corrected GS implementation - Removed
muspydependency from GS calculation - Added comprehensive docstrings with paper references and formulas
Documentation Updates
- Updated metric counts: Single-file (14→13), Rhythmic single (4→5), Rhythmic pair (3→2)
- Corrected Wu & Yang ISMIR 2020 paper citations and links
- Added comprehensive source references for rhythmic metrics
- Clarified metric categories, ranges, and purposes
API Changes
- Modified:
grooving_pattern_similarity(midi_path)— now uses 64-dim vectors, normalized Hamming similarity - Unchanged:
gscinDistributionResult(name unchanged from v5.2) - Removed:
gsfield fromSingleFileResult - Added:
gsfield toRhythmicResult
Changed
- Version: 5.2.0 → 5.3.0
- Total metrics: 52 (reorganized for clarity and consistency)
v5.2 Changelog
Optimizations
- CS Model Caching: Chord Similarity model now cached for batch evaluation
- Model loaded once and reused across multiple
compute_cs()calls - ~1.6× faster per call after initial load, ~39% time savings on large batches
- Example: 153 file pairs reduced from 34.6s → 21.2s
- Thread-safe caching with automatic device management
- New API:
clear_cs_model_cache()to manually free GPU/CPU memory after batch processing
- Model loaded once and reused across multiple
- Memory Management: Users can explicitly release CS model memory when needed
API Changes
- New function:
clear_cs_model_cache()— Clear cached CS models to free memory - Improved:
compute_cs()now automatically caches model for subsequent calls
Changed
- Version: 5.1.0 → 5.2.0
- Performance: Batch CS evaluation significantly faster (model loaded once vs. per-call)
Example
from smg_metrics import compute_cs, clear_cs_model_cache
# Batch evaluation - model loaded once, reused for all pairs
for pred, ref in file_pairs:
cs = compute_cs(pred, ref) # Fast after first call
# Free memory after batch
clear_cs_model_cache() # Returns: 1 (number of models cleared)
v5.1 Changelog
New features
- Chord Similarity (CS): Deep chord embedding metric using pruned PolyDisVAE encoder (Wang et al. ISMIR 2020)
- Optional dependency: requires
torch>=2.0.0(install withpip install smg-metrics[torch]) - Lightweight model (29 MB):
polydis-v1-chd_encoder_only.pt - Supports 17 chord qualities + inversions via dynamic programming recognition
- Optional dependency: requires
- Out-of-Key (OOK): Percentage of notes outside detected key using Krumhansl-Kessler algorithm
- Standalone single-file metric, no external annotations needed
- Integrated into CLI with
--only ooksupport
Changed
- PyTorch: Moved to optional dependencies (
torchextra) - Metric count: 51 → 53 (added CS + OOK)
- Version: 5.0.0 → 5.1.0
License
MIT — see LICENCE.
Citation
If you use smg-metrics in your research, please cite:
@software{smg_metrics,
title = {smg-metrics: Objective Evaluation Metrics for Symbolic Music Generation},
author = {Temmie Pratt},
year = {2026},
url = {https://github.com/OlyMarco/smg_metric},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smg_metrics-5.3.0.tar.gz.
File metadata
- Download URL: smg_metrics-5.3.0.tar.gz
- Upload date:
- Size: 28.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b673f318211920afd4e0e74fccea99b0c2f6d5d2afc071049726e2b94b8a2707
|
|
| MD5 |
cd8f30b8447f8a135fb54fbdce5b67db
|
|
| BLAKE2b-256 |
1e13f4648e4bff5d5014f6f526ab0a6f433f085399d24fdfa76bb7b9943dfb52
|
File details
Details for the file smg_metrics-5.3.0-py3-none-any.whl.
File metadata
- Download URL: smg_metrics-5.3.0-py3-none-any.whl
- Upload date:
- Size: 28.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1aa4d9e583b5d6d7f9fc586f2e0d082852f559d1a824a47ff968041c871d89fc
|
|
| MD5 |
584f88d68497875d5d0469181c85c07b
|
|
| BLAKE2b-256 |
1a6258b986bc022a618a6a7491193355b2d16d877b1f3a82a2244fd2cd44fa0a
|