Objective evaluation metrics for Symbolic Music Generation
Project description
smg_metrics
Symbolic Music Generation Metrics — 51 objective evaluation metrics, zero config.
8 categories, 51 metrics, 20 papers/projects (1990–2026), fully typed & tested.
| Category | Count | Latest source | Year |
|---|---|---|---|
| A. Single-file Quality | 13 | MusPy | 2020 |
| B. Note-level Pairwise | 5 | Ou et al. | 2025 |
| C. Bar-level Pairwise | 2 | MuseMorphose | 2023 |
| D. Chord-level Pairwise | 1 | GETMusic | 2023 |
| E. Distribution-level | 6 | FGG | 2025 |
| F. Advanced | 14 | Text2midi | 2025 |
| G. Structural | 4 | MuseTok | 2026 |
| H. Rhythmic/Temporal | 6 | D3PIA | 2026 |
Quick Start
pip install smg-metrics
from smg_metrics import single_file, single_file_rhythmic, pair_eval
quality = single_file("generated.mid")
print(quality.pce, quality.ebr, quality.gs)
rhythm = single_file_rhythmic("generated.mid")
print(rhythm.mean_ioi, rhythm.rhythmic_density)
pair = pair_eval("generated.mid", "reference.mid")
print(pair.note_f1, pair.sim_chr, pair.ca, pair.note_overlap)
smg-eval --music generated.mid
smg-eval --music gen.mid --pred gen.mid --ref ref.mid --dist --advanced --structural --rhythmic
Installation
pip install smg-metrics
# Or install from source:
git clone https://github.com/OlyMarco/smg_metric.git
cd smg_metric && pip install -e .
| Package | Version | Purpose |
|---|---|---|
muspy |
>= 0.5.0 | 13 single-file quality metrics |
miditoolkit |
>= 1.0 | MIDI parsing |
pretty-midi |
>= 0.2.10 | Bar-level similarity parsing |
mir-eval |
>= 0.7 | Note-overlap metric |
numpy |
>= 1.24 | Numerical computation |
scipy |
>= 1.10 | Scientific computing |
Python API
from smg_metrics import (
single_file, # 13 MusPy quality metrics
single_file_structural, # 2 structural single-file metrics
single_file_rhythmic, # 4 D3PIA-style rhythmic metrics
pair_eval, # 10 core pairwise metrics
pair_eval_structural, # 2 structural pairwise metrics
distribution_eval, # 6 distribution-level metrics
advanced_eval, # 14 advanced metrics
)
| Container | Fields | Count |
|---|---|---|
SingleFileResult |
pce, ebr, gs, sc, pisr, polyphony, polyphony_rate, pitch_range, n_pitches_used, n_pitch_classes_used, emr, pe, dpc | 13 |
StructuralSingleResult |
che, ngram_div | 2 |
RhythmicResult |
mean_ioi, rhythmic_intensity, rhythmic_density, voice_number | 4 |
PairResult |
note_f1, notei_f1, mel_f1, i_iou, ver, sim_chr, sim_grv, ca, onset_xor, note_overlap | 10 |
StructuralPairResult |
melody_match, tonal_dist | 2 |
DistributionResult |
pd, dd, ook, sc_sim, pce_sim, gs_sim | 6 |
AdvancedResult |
kl_duration, kl_ioi, kl_pitch, oa_duration, oa_ioi, oa_pitch_range, oa_density, ci_precision, ci_recall, ci_f1, cts, cr_pred, cr_ref, recon_acc | 14 |
Every result container is a frozen dataclass with .to_dict().
Individual metrics
from smg_metrics import (
chord_histogram_entropy, ngram_diversity,
melody_matchness, tonal_distance,
compute_ca, midi_to_chords,
mean_ioi, rhythmic_intensity, rhythmic_density,
voice_number, onset_xor_distance, note_overlap,
)
che = chord_histogram_entropy("file.mid")
div = ngram_diversity("file.mid", n=4)
ca = compute_ca("pred.mid", "ref.mid")
ioi = mean_ioi("file.mid")
xor = onset_xor_distance("pred.mid", "ref.mid")
nov = note_overlap("pred.mid", "ref.mid")
CLI Usage
# Single-file quality (13 metrics)
smg-eval --music generated.mid
# Single-file quality + structural + rhythmic (19 metrics)
smg-eval --music generated.mid --structural --rhythmic
# Pairwise core (10 metrics)
smg-eval --pred gen.mid --ref ref.mid
# Full 51-metric run for one generated/reference pair
smg-eval --music gen.mid --pred gen.mid --ref ref.mid --dist --advanced --structural --rhythmic
# JSON output
smg-eval --pred gen.mid --ref ref.mid --json
# Batch directory
smg-eval --pred_dir ./pred/ --ref_dir ./ref/
| Flag | Description | Default |
|---|---|---|
--music PATH |
Single-file mode | -- |
--pred PATH |
Predicted MIDI for pair mode | -- |
--ref PATH |
Reference MIDI for pair mode | -- |
--pred_dir DIR |
Batch predicted directory | -- |
--ref_dir DIR |
Batch reference directory | -- |
--root INT |
Root pitch for PISR | 0 |
--mode {major,minor} |
Scale mode for PISR | major |
--dist |
Include distribution-level metrics | false |
--advanced |
Include advanced metrics | false |
--structural |
Include structural metrics | false |
--rhythmic |
Include rhythmic/temporal metrics | false |
--json |
Output as JSON | false |
Metrics Reference
A. Single-file Quality (13)
Source: MusPy / ISMIR 2020.
| Metric | Symbol | Range |
|---|---|---|
| Pitch Class Entropy | PCE | [0, log2(12)] |
| Empty Beat Rate | EBR | [0, 1] |
| Groove Consistency | GS | [0, 1] |
| Scale Consistency | SC | [0, 1] |
| Pitch-in-Scale Rate | PISR | [0, 1] |
| Polyphony | Poly | [0, inf) |
| Polyphony Rate | PR | [0, 1] |
| Pitch Range | Range | [0, 127] |
| Unique Pitches | N_p | [0, 128] |
| Unique Pitch Classes | N_pc | [0, 12] |
| Empty Measure Rate | EMR | [0, 1] |
| Pitch Entropy | PE | [0, 7] |
| Drum Pattern Consistency | DPC | [0, 1] |
B. Note-level Pairwise (5)
Source: Ou et al., Appendix C.
| Metric | Symbol | Range | Description |
|---|---|---|---|
| Note F1 | F1 | [0, 1] | Quantised onset + pitch F1 |
| Notei F1 | F1i | [0, 1] | Note F1 plus instrument |
| Melody F1 | F1mel | [0, 1] | Note F1 on detected melody track |
| Instrument IoU | I-IoU | [0, 1] | Instrument set IoU |
| Voice Error Rate | VER | [0, inf) | Normalised voice-order edit distance |
B2. Pairwise Rhythmic (2)
| Metric | Symbol | Range | Description |
|---|---|---|---|
| Onset XOR Distance | XOR | [0, 1] | Full-piece aligned binary onset-pattern XOR distance |
| Note Overlap | NOvlp | [0, 1] | mir_eval transcription average overlap |
C. Bar-level Pairwise (2)
Source: MuseMorphose.
| Metric | Symbol | Range |
|---|---|---|
| Chroma Similarity | simChr | [0, 1] |
| Groove Similarity | simGrv | [0, 1] |
D. Chord-level Pairwise (1)
Source: GETMusic, Eq. 6.
| Metric | Symbol | Range | Description |
|---|---|---|---|
| Chord Accuracy | CA | [0, 1] | Per-measure chord label match rate with Viterbi HMM chord recognition |
E. Distribution-level (6)
| Metric | Range | Description |
|---|---|---|
| PD | [0, 1] | Pitch distribution overlap |
| DD | [0, 1] | Duration distribution overlap |
| OOK | [0, 1] | Out-of-key rate on active 16th-note steps |
| SC_sim | [0, 1] | Scale consistency similarity |
| PCE_sim | [0, 1] | Pitch-class entropy similarity |
| GS_sim | [0, 1] | Groove consistency similarity |
F. Advanced Metrics (14)
Sources: rule-guided diffusion, Text2midi, MuseTok.
| Group | Metrics |
|---|---|
| KL divergence | kl_duration, kl_ioi, kl_pitch |
| Overlapping area | oa_duration, oa_ioi, oa_pitch_range, oa_density |
| Instrument coverage | ci_precision, ci_recall, ci_f1 |
| Metadata / repetition / reconstruction | cts, cr_pred, cr_ref, recon_acc |
G. Structural Metrics (4)
| Metric | Type | Range |
|---|---|---|
| Chord Histogram Entropy | Single | [0, log2(C)] |
| N-gram Diversity | Single | [0, 1] |
| Melody Matchness | Pair | [0, 1] |
| Tonal Distance | Pair | [0, inf) |
H. Rhythmic & Temporal Metrics (4 single-file + 2 pairwise)
| Metric | Symbol | Type | Range |
|---|---|---|---|
| Mean Inter-Onset Interval | IOI | Single | [0, inf) |
| Rhythmic Intensity | RI | Single | [0, inf) |
| Rhythmic Density | RD | Single | [0, 1] |
| Voice Number | VN | Single | [0, inf) |
| Onset XOR Distance | XOR | Pair | [0, 1] |
| Note Overlap | NOvlp | Pair | [0, 1] |
Research Notes
- FGG uses POP909 accompaniment generation at 16th-note resolution and reports % out-of-key notes, direct chord accuracy, chord progression similarity, chord IoU, and piano-roll IoU. The package implements the reproducible local-MIDI parts of that evaluation: OOK, Viterbi chord accuracy, pitch/duration overlaps, note overlap, and structural similarities.
- The FGG paper’s arXiv HTML reports Table 1 values: FGG % out-of-key notes 0.0%, direct chord accuracy 0.485, chord similarity 0.767, chord IoU 0.769, and piano-roll IoU 0.281; GETMusic scores lower on the same table. These values are model-generation results, not hard-coded package tests.
- D3PIA’s demo page exposes POP909 sample MIDI for GT, D3PIA, Polyffusion, C&E-E, WholeSongGen, FGG, and leadsheet models; the validation procedure can download those MIDIs and run
pair_eval()/compute_ca()locally.
Package Structure
smg_metric/
|-- pyproject.toml
|-- README.md
|-- test.py # Full 51-metric test script
|-- data/ # Classical + POP909 MIDI test files
|-- smg_metrics/
| |-- __init__.py # Public API exports
| |-- __main__.py # python -m smg_metrics
| |-- py.typed # PEP 561 marker
| |-- _io.py # Shared MIDI I/O
| |-- _stats.py # Shared statistics
| |-- _edit.py # Shared sequence editing
| |-- single.py # single_file wrappers
| |-- pair.py # pair_eval wrappers
| |-- rhythmic.py # D3PIA + mir_eval rhythmic metrics
| |-- muspy_ext.py # 13 MusPy metrics
| |-- note_f1.py # 5 note-level pairwise metrics
| |-- similarity.py # 2 bar-level similarity metrics
| |-- chord_accuracy.py # Chord Accuracy HMM
| |-- distribution.py # 6 distribution-level metrics
| |-- advanced.py # 14 advanced metrics
| |-- structural.py # 4 structural metrics
| +-- cli.py # CLI entry point
Testing
python test.py
python test.py data/
python test.py --single-only file.mid
python test.py --pair-only pred.mid ref.mid
test.py validates:
- Single-file quality (13 metrics × N files)
- Single-file structural (2 metrics × N files)
- Single-file rhythmic/temporal (4 metrics × N files)
- Pairwise note/rhythmic/bar/chord/structural/distribution/advanced (32 metrics × N pairs)
- Self-consistency (12 checks × N files)
Citation
If you use this toolkit, cite the relevant metric sources for the categories used:
@article{dong2020muspy,
title={MusPy: A Toolkit for Symbolic Music Generation},
author={Dong, Hao and others},
journal={Proc. ISMIR},
year={2020},
url={https://arxiv.org/abs/2008.01951}
}
@article{zhu2025fgg,
title={Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation},
author={Zhu, Tingyu and Liu, Haoyu and Wang, Ziyu and Jiang, Zhimin and Zheng, Zeyu},
journal={Proc. ICML},
year={2025},
url={https://arxiv.org/abs/2410.08435}
}
@inproceedings{choi2026d3pia,
title={D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead Sheet},
author={Choi, Eunjin and Kim, Hounsu and Bang, Hayeon and Kwon, Taegyun and Nam, Juhan},
booktitle={Proc. ICASSP},
year={2026}
}
@inproceedings{raffel2014mir_eval,
title={mir_eval: A Transparent Implementation of Common MIR Metrics},
author={Raffel, Colin and others},
booktitle={Proc. ISMIR},
year={2014}
}
License
MIT. See LICENCE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smg_metrics-0.4.0.tar.gz.
File metadata
- Download URL: smg_metrics-0.4.0.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
306bb39205e9042a6b34cd4f74a6cd0b4fd861e9aa6e5a764b6870fd4b904ccf
|
|
| MD5 |
6db47fd8e196c473dc49c4d33d810100
|
|
| BLAKE2b-256 |
945ae60a09140a0ef606ed99634f42734b27a5c86a49beb614881f95580364e9
|
File details
Details for the file smg_metrics-0.4.0-py3-none-any.whl.
File metadata
- Download URL: smg_metrics-0.4.0-py3-none-any.whl
- Upload date:
- Size: 40.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e580cd19c1762c2b699f44ec236d415fb04dcfc32ef0fd86f36cedcbdab3a52
|
|
| MD5 |
d0b65f9e1fcfe7e84917a40cde67a523
|
|
| BLAKE2b-256 |
8eaf3d420d8c862a9ff13c03b29daca3087914fc10cc9f4a920099431aabc3fd
|