Skip to main content

Objective evaluation metrics for Symbolic Music Generation

Project description

smg_metrics

Symbolic Music Generation Metrics — 45 objective evaluation metrics, zero config.

Python 3.10+ License: MIT

7 categories, 45 metrics, 18 papers (2006–2026), fully typed & tested.

Category Count Latest Paper Year
A. Single-file Quality 13 MusPy 2020
B. Note-level Pairwise 5 Ou et al., NeurIPS 2025 2025
C. Bar-level Pairwise 2 MuseMorphose 2023
D. Chord-level Pairwise 1 GETMusic 2023
E. Distribution-level 6 FGG 2025
F. Advanced 14 Text2midi 2025
G. Structural 4 MuseTok 2026

Paper timeline:

2006  Harte et al. (ACM MM) .............. Tonal Distance
2012  Papadopoulos & Peeters (ISMIR) ..... Chord Histogram Entropy
2016  Mogren (NeurIPS WS) ................ Scale Consistency (C-RNN-GAN)
2018  Dong et al. (AAAI) ................. MuseGAN metrics (PISR/PR/EMR/DPC)
      Dong et al. (ISMIR LBD) ............ Pypianoroll (EBR)
      Yang & Lerch (NCA) ................. N-gram Diversity
2020  MusPy (ISMIR) ...................... PCE/GS/PitchRange/PE/...
      Jazz Transformer (ISMIR) ............ Groove Consistency
      SongMASS + PopMAG (ACM-MM) ......... PD/DD/CA
2023  MuseMorphose (TASLP) ............... simChr/simgrv
      GETMusic (IJCAI) ................... Chord Accuracy (Viterbi HMM)
2024  SCG (ICML Oral) .................... KL/OA/CI/CTS
2025  FGG (ICML) ......................... OOK
      Text2midi (AAAI) ................... CR
      Ou et al. (NeurIPS) ................ Note F1/Mel F1/I-IoU/VER
2026  MuseTok (ICASSP) ................... ReconAcc

Quick Start (30 seconds)

pip install -e .
from smg_metrics import single_file, pair_eval

# Single-file quality (13 metrics, no reference needed)
q = single_file("generated.mid")
print(q.pce, q.ebr, q.gs)

# Pairwise comparison (30 metrics: note + structural + distribution + advanced)
s = pair_eval("generated.mid", "reference.mid")
print(s.note_f1, s.sim_chr, s.ca)
# CLI
smg-eval --music generated.mid
smg-eval --pred gen.mid --ref ref.mid --dist --advanced --structural

Table of Contents

  1. Installation
  2. Python API
  3. CLI Usage
  4. Metrics Reference
  5. Package Structure
  6. Testing
  7. Citation

1. Installation

pip install -e .

# Or install dependencies manually:
pip install muspy miditoolkit pretty-midi numpy scipy
Package Version Purpose
muspy >= 0.5.0 13 single-file quality metrics
miditoolkit >= 1.0 MIDI parsing
pretty-midi >= 0.2.10 MIDI parsing (similarity module)
numpy >= 1.24 Numerical computation
scipy >= 1.10 Scientific computing

2. Python API

High-level API

from smg_metrics import (
    single_file,                # 13 MusPy quality metrics
    single_file_structural,     # 2 structural metrics (CHE, Ngram)
    pair_eval,                  # 8 pairwise metrics
    pair_eval_structural,       # 2 structural pairwise (MelodyMatch, TonalDist)
    distribution_eval,          # 6 distribution-level metrics
    advanced_eval,              # 14 advanced metrics
)

# Single-file
quality = single_file("output.mid")           # 13 metrics
struct  = single_file_structural("output.mid") # 2 metrics

# Pairwise (pred vs ref)
pair    = pair_eval("gen.mid", "ref.mid")            # 8 metrics
pstruct = pair_eval_structural("gen.mid", "ref.mid")  # 2 metrics
dist    = distribution_eval("gen.mid", "ref.mid")     # 6 metrics
adv     = advanced_eval("gen.mid", "ref.mid")         # 14 metrics

Individual metrics

from smg_metrics import (
    chord_histogram_entropy, ngram_diversity,
    melody_matchness, tonal_distance,
    compute_ca, midi_to_chords,
)

che  = chord_histogram_entropy("file.mid")          # Chord Histogram Entropy
div  = ngram_diversity("file.mid", n=4)             # N-gram diversity
mm   = melody_matchness("pred.mid", "ref.mid")      # Melody similarity
td   = tonal_distance("pred.mid", "ref.mid")        # Tonal distance
ca   = compute_ca("pred.mid", "ref.mid")            # Chord Accuracy
chords = midi_to_chords("file.mid")                 # Chord labels per bar

Result containers

Every function returns a frozen dataclass with .to_dict():

quality = single_file("file.mid")
print(quality.pce)          # 3.16
print(quality.to_dict())    # {'pce': 3.16, 'ebr': 0.03, ...}
Container Fields Count
SingleFileResult pce, ebr, gs, sc, pisr, polyphony, polyphony_rate, pitch_range, n_pitches_used, n_pitch_classes_used, emr, pe, dpc 13
StructuralSingleResult che, ngram_div 2
PairResult note_f1, notei_f1, mel_f1, i_iou, ver, sim_chr, sim_grv, ca 8
StructuralPairResult melody_match, tonal_dist 2
DistributionResult pd, dd, ook, sc_sim, pce_sim, gs_sim 6
AdvancedResult kl_duration, kl_ioi, kl_pitch, oa_duration, oa_ioi, oa_pitch_range, oa_density, ci_precision, ci_recall, ci_f1, cts, cr_pred, cr_ref, recon_acc 14

3. CLI Usage

# Single-file (13 metrics)
smg-eval --music generated.mid

# Single-file + structural (15 metrics)
smg-eval --music generated.mid --structural

# Pairwise (8 metrics)
smg-eval --pred gen.mid --ref ref.mid

# All metrics (45 metrics)
smg-eval --music gen.mid --pred gen.mid --ref ref.mid --dist --advanced --structural

# JSON output
smg-eval --pred gen.mid --ref ref.mid --json

# Batch directory
smg-eval --pred_dir ./pred/ --ref_dir ./ref/
Flag Description Default
--music PATH Single-file mode --
--pred PATH Predicted MIDI (pair mode) --
--ref PATH Reference MIDI (pair mode) --
--pred_dir DIR Batch predicted directory --
--ref_dir DIR Batch reference directory --
--root INT Root pitch for PISR (0=C) 0
--mode {major,minor} Scale mode for PISR major
--dist Include distribution-level metrics false
--advanced Include advanced metrics false
--structural Include structural metrics false
--json Output as JSON false

4. Metrics Reference

A. Single-file Quality (13 metrics)

No reference file required. Source: MusPy (ISMIR 2020).

Metric Symbol Range Paper
Pitch Class Entropy PCE [0, log2(12)] Jazz Transformer, ISMIR 2020
Empty Beat Rate EBR [0, 1] Pypianoroll, ISMIR 2018
Groove Consistency GS [0, 1] Jazz Transformer, ISMIR 2020
Scale Consistency SC [0, 1] C-RNN-GAN, NeurIPS 2016 WS
Pitch-in-Scale Rate PISR [0, 1] MuseGAN, AAAI 2018
Polyphony Poly [1, inf) MuseGAN, AAAI 2018
Polyphony Rate PR [0, 1] MuseGAN, AAAI 2018
Pitch Range Range [0, 127] MusPy, ISMIR 2020
Unique Pitches N_p [0, 128] MusPy, ISMIR 2020
Unique Pitch Classes N_pc [0, 12] MusPy, ISMIR 2020
Empty Measure Rate EMR [0, 1] MuseGAN, AAAI 2018
Pitch Entropy PE [0, 7] MusPy, ISMIR 2020
Drum Pattern Consistency DPC [0, 1] MuseGAN, AAAI 2018

B. Pairwise Note-level (5 metrics)

Source: Ou et al., NeurIPS 2025, Appendix C.

Metric Symbol Range Description
Note F1 F1 [0, 1] Note-level F1 (onset + pitch, 16th-note quantised)
Notei F1 F1i [0, 1] Note F1 + correct instrument
Melody F1 F1mel [0, 1] Note F1 on melody track only
Instrument IoU I-IoU [0, 1] Instrument set intersection-over-union
Voice Error Rate VER [0, inf) Normalised edit distance of voice ordering

C. Pairwise Bar-level (2 metrics)

Source: MuseMorphose (Wu & Yang, IEEE/ACM TASLP 2023).

Metric Symbol Range Description
Chroma Similarity simChr [0, 1] Bar-level pitch-class cosine similarity
Groove Similarity simGrv [0, 1] Bar-level onset-pattern cosine similarity

D. Pairwise Chord-level (1 metric)

Source: GETMusic (Lv et al., IJCAI 2023), Eq. 6.

Metric Symbol Range Description
Chord Accuracy CA [0, 1] Per-measure chord label match rate (Viterbi HMM)

E. Distribution-level (6 metrics)

Sources: SongMASS (Ren et al., ACM-MM 2020), FGG (ICML 2025).

Metric Range Description
PD [0, 1] Pitch Distribution overlap
DD [0, 1] Duration Distribution overlap
OOK [0, 1] Out-of-Key Rate (auto-detected key)
SC_sim [0, 1] Scale Consistency similarity
PCE_sim [0, 1] Pitch Class Entropy similarity
GS_sim [0, 1] Groove Consistency similarity

F. Advanced Metrics (14 metrics)

Sources: Rule Guided Diffusion (ICML 2024), Text2midi (AAAI 2025), MuseTok (ICASSP 2026).

Metric Range Description Source
KL Duration [0, inf) KL divergence of duration distributions rule-guided-music
KL IOI [0, inf) KL divergence of IOI distributions rule-guided-music
KL Pitch [0, inf) KL divergence of pitch distributions rule-guided-music
OA Duration [0, 1] Overlapping area of mean duration rule-guided-music
OA IOI [0, 1] Overlapping area of mean IOI rule-guided-music
OA Pitch Range [0, 1] Overlapping area of pitch range rule-guided-music
OA Density [0, 1] Overlapping area of note density rule-guided-music
CI Precision [0, 1] Instrument coverage precision rule-guided-music
CI Recall [0, 1] Instrument coverage recall rule-guided-music
CI F1 [0, 1] Instrument coverage F1 rule-guided-music
CTS {0, 1, NaN} Correct Time Signature rule-guided-music
CR Pred [0, inf) Compression ratio (predicted) Text2midi
CR Ref [0, inf) Compression ratio (reference) Text2midi
ReconAcc [0, 1] Reconstruction accuracy (edit distance) MuseTok

G. Structural Metrics (4 metrics)

Metric Type Range Paper
Chord Histogram Entropy Single [0, log2(C)] Papadopoulos & Peeters, ISMIR 2012
N-gram Diversity Single [0, 1] Yang & Lerch, NCA 2018
Melody Matchness Pair [0, 1] Mongeau & Sankoff, CH 1990
Tonal Distance Pair [0, inf) Harte et al., ACM MM 2006

5. Package Structure

smg_metric/
|-- pyproject.toml          # Package metadata & dependencies
|-- README.md               # This file
|-- test.py                 # Full 45-metric test script (165 tests)
|-- data/                   # Test MIDI files (classical piano)
|-- smg_metrics/            # Main package (v0.3.0)
|   |-- __init__.py         # Public API exports (45 metrics)
|   |-- __main__.py         # python -m smg_metrics
|   |-- py.typed            # PEP 561 marker
|   |-- _io.py              # Shared MIDI I/O (Note3/Note4, extract, quantise)
|   |-- _stats.py           # Shared statistics (overlap, KL, normal overlap)
|   |-- _edit.py            # Shared sequence editing (Levenshtein, melody extract)
|   |-- single.py           # single_file() + single_file_structural()
|   |-- pair.py             # pair_eval() + pair_eval_structural()
|   |-- muspy_ext.py        # 13 MusPy metrics
|   |-- note_f1.py          # 5 note-level pairwise metrics
|   |-- similarity.py       # 2 bar-level similarity metrics
|   |-- chord_accuracy.py   # Chord Accuracy (Viterbi HMM)
|   |-- distribution.py     # 6 distribution-level metrics
|   |-- advanced.py         # 14 advanced metrics
|   |-- structural.py       # 4 structural metrics
|   +-- cli.py              # CLI entry point

6. Testing

# Test all MIDI files in data/ directory
python test.py

# Test specific files
python test.py a.mid b.mid c.mid

# Quick single-file test
python test.py --single-only file.mid

# Quick pairwise test
python test.py --pair-only pred.mid ref.mid

test.py validates:

  1. Single-file quality (13 metrics x N files)
  2. Single-file structural (2 metrics x N files)
  3. Pairwise note/structural/distribution/advanced (30 metrics x N pairs)
  4. Self-consistency (same file -> perfect scores)

7. Citation

If you use this toolkit, please cite the relevant papers:

@article{dong2020muspy,
  title   = {MusPy: A Toolkit for Symbolic Music Generation},
  author  = {Dong, Hao et al.},
  journal = {Proc. ISMIR},
  year    = {2020},
  url     = {https://arxiv.org/abs/2008.01951}
}

@article{ou2025arrangement,
  title   = {Unifying Symbolic Music Arrangement with Track-aware Segments},
  author  = {Ou, Longshen and Zhao, Jingwei and Wang, Ziyu and Xia, Gus},
  journal = {Proc. NeurIPS},
  year    = {2025},
  url     = {https://arxiv.org/abs/2408.15176}
}

@article{lv2023getmusic,
  title   = {GETMusic: Generating Any Music Tracks with a Unified Model},
  author  = {Lv, Huan et al.},
  journal = {Proc. IJCAI},
  year    = {2023},
  url     = {https://arxiv.org/abs/2305.10841}
}

@article{wu2023morphose,
  title   = {MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer},
  author  = {Wu, Shangda and Yang, Yuxuan},
  journal = {IEEE/ACM Trans. ASLP},
  year    = {2023},
  url     = {https://arxiv.org/abs/2105.04090}
}

@inproceedings{ren2020popmag,
  title     = {PopMAG: Pop Music Accompaniment Generation},
  author    = {Ren, Yi et al.},
  booktitle = {Proc. ACM Multimedia},
  year      = {2020},
  url       = {https://arxiv.org/abs/2008.07703}
}

@article{zhu2025fgg,
  title   = {Efficient Fine-Grained Guidance for Diffusion Model Based Symbolic Music Generation},
  author  = {Zhu, Tingyu and Liu, Haoyu and Wang, Ziyu and Jiang, Zhimin and Zheng, Zeyu},
  journal = {Proc. ICML},
  year    = {2025},
  url     = {https://arxiv.org/abs/2410.08435}
}

@inproceedings{hu2024ruleguided,
  title     = {Controllable Music Generation via Non-autoregressive Transformer and Randomized Guided Diffusion},
  author    = {Hu, Yifan et al.},
  booktitle = {Proc. ICML},
  year      = {2024},
  url       = {https://arxiv.org/abs/2402.14285}
}

@article{yadav2025text2midi,
  title   = {Text2midi: Generating Symbolic Music from Captions},
  author  = {Yadav, Abhinaba et al.},
  journal = {Proc. AAAI},
  year    = {2025},
  url     = {https://arxiv.org/abs/2412.16526}
}

@article{zeng2026musetok,
  title   = {MuseTok: Musical Discrete Tokenization},
  author  = {Zeng, Yun et al.},
  journal = {Proc. ICASSP},
  year    = {2026},
  url     = {https://arxiv.org/abs/2510.16273}
}

@inproceedings{papadopoulos2012chord,
  title     = {Large-scale Study of Chord Estimation Algorithms Based on Chroma},
  author    = {Papadopoulos, Helene and Peeters, Geoffroy},
  booktitle = {Proc. ISMIR},
  year      = {2012}
}

@article{yang2018evaluation,
  title   = {On the Evaluation of Generative Models in Music},
  author  = {Yang, Li-Chia and Lerch, Alexander},
  journal = {Neural Computing and Applications},
  year    = {2018},
  url     = {https://link.springer.com/article/10.1007/s00521-018-3759-5}
}

@article{mongeau1990comparison,
  title   = {Comparison of Musical Sequences},
  author  = {Mongeau, Marcel and Sankoff, David},
  journal = {Computers and the Humanities},
  year    = {1990}
}

@inproceedings{harte2006detecting,
  title     = {Detecting Harmonic Change in Musical Audio},
  author    = {Harte, Christopher and Sandler, Mark and Gasser, Martin},
  booktitle = {Proc. ACM MM Workshop},
  year      = {2006}
}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smg_metrics-0.3.0.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smg_metrics-0.3.0-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file smg_metrics-0.3.0.tar.gz.

File metadata

  • Download URL: smg_metrics-0.3.0.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for smg_metrics-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3a585e296dcb8f12100586566f89700a1b6f69cb7aeb29f1b5070bb3c9c082c5
MD5 9a76a56f43f9472e3dd7f0b9c6d856bf
BLAKE2b-256 73c8f9a3e74dadd6a30fc513416edf1cea059a5f70735a915576b25f139b5ef5

See more details on using hashes here.

File details

Details for the file smg_metrics-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: smg_metrics-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for smg_metrics-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bbaba8c89a64fec2c352d0689a2bc174962a36cb996e710505e189326f79de97
MD5 d7c48540c089aa6a55e0123691bbc69e
BLAKE2b-256 6bc468d9ecfd25df40412864fa5561dd61a5b579c607d219b8841f3e4c731137

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page