Skip to main content

Extract pitch class sets and interval vectors from MusicXML files, segmented by chord symbols

Project description

musicxml-to-pcs

Extract pitch class sets and interval vectors from MusicXML files, automatically segmented by chord symbols.

Designed for computational musicology, jazz analysis, and anyone studying the relationship between melody and harmony using set theory.

Features

  • Parse MusicXML files with embedded chord symbols
  • Segment melodies by chord changes automatically
  • Compute for each segment:
    • Pitch class set
    • Interval vector
    • Forte class (set class name)
    • Prime form
  • Export to JSON or CSV
  • Command-line interface included

Installation

pip install musicxml-to-pcs

Quick Start

Python API

from musicxml_to_pcs import PCSExtractor

# Parse and extract
extractor = PCSExtractor('music.xml')
segments = extractor.extract()

# Iterate over segments
for seg in segments:
    print(f"{seg.chord_symbol}: {seg.forte_class} {seg.interval_vector_string}")

# Export
extractor.to_json('output.json')
extractor.to_csv('output.csv')

Command Line

# Print analysis to console
musicxml-to-pcs music.xml

# Export to JSON
musicxml-to-pcs music.xml --json output.json

# Export to CSV
musicxml-to-pcs music.xml --csv output.csv

# Limit output
musicxml-to-pcs music.xml --limit 20

# Relative to chord root (root = 0)
musicxml-to-pcs music.xml --relative-to chord_root

# Relative to key (Bb = 10 as tonic = 0)
musicxml-to-pcs music.xml --relative-to key --key-root 10

Pitch Class Reference Modes

By default, pitch classes use absolute values where C=0. For harmonic analysis, relative modes are often more useful:

Absolute (default)

extractor.extract()  # C=0, C#=1, D=2, ... B=11

Relative to Chord Root

extractor.extract(relative_to='chord_root')

Each segment's pitch classes are calculated relative to its chord root. Over a Bb chord, Bb=0, C=2, D=4, etc. This lets you compare melodic choices across the same chord type in different keys.

Relative to Key

extractor.extract(relative_to='key', key_root=10)  # Bb=0

All pitch classes are relative to the tonic. Useful for analyzing scale degrees throughout a piece.

Output Format

Each HarmonicSegment contains:

Field Description Example
measure Measure number 2
beat Beat position within measure 0.0
chord_symbol Chord symbol from score "Bb"
chord_root Pitch class of chord root 10
chord_kind Chord quality "maj"
pitch_classes List of pitch classes in melody [0, 1, 2, 3, 5, 10]
interval_vector Interval class content [3, 4, 3, 2, 3, 0]
forte_class Forte set class name "6-8"
prime_form Prime form of the set [0, 2, 3, 4, 5, 7]
note_count Number of notes in segment 8

Convenience Properties

seg.interval_vector_string  # "(343230)"
seg.pitch_class_set_string  # "{0,1,2,3,5,10}"
seg.prime_form_string       # "<0,2,3,4,5,7>"

Example Analysis

Using Charlie Parker's "Anthropology" (available in the Charlie Parker Omnibook MusicXML dataset):

M  2 beat 0.0 | B-       | PC: {0,1,2,3,5,10}  | IV: (343230) | Forte: 6-8
M  3 beat 0.0 | Cm       | PC: {3}             | IV: (000000) | Forte: 1-1
M  3 beat 2.0 | F7       | PC: {3,5}           | IV: (010000) | Forte: 2-2
M  4 beat 0.0 | Dm       | PC: {2,3}           | IV: (100000) | Forte: 2-1
M  4 beat 2.0 | G7       | PC: {0,2,9,10}      | IV: (121110) | Forte: 4-11A

Summary Statistics

summary = extractor.summary()
print(summary['unique_forte_classes'])    # Number of unique set classes
print(summary['top_forte_classes'])       # Most common set classes
print(summary['top_interval_vectors'])    # Most common interval vectors

Requirements

  • Python 3.9+
  • music21 >= 9.1.0

Background

What is a Pitch Class Set?

A pitch class set abstracts pitches to integers 0-11 (C=0, C#=1, ... B=11), ignoring octave. The set {0, 4, 7} represents any C major triad in any voicing.

What is an Interval Vector?

The interval vector counts all intervals between pairs of notes in a set:

Position: [1,  2,  3,  4,  5,  6]
Meaning:  [m2, M2, m3, M3, P4, TT]

For example, a minor triad {0, 3, 7} has interval vector (001110) — one minor third, one major third, one perfect fourth.

What is a Forte Class?

Allen Forte's classification system assigns a unique identifier to each set class. 3-11 is the minor/major triad class, 4-27 is the dominant seventh class, etc.

Use Cases

  • Jazz Analysis: Study how improvisers navigate chord changes
  • Computational Musicology: Extract features for machine learning
  • Composition: Analyze intervallic content of melodic material
  • Music Theory Research: Corpus studies of pitch class usage

License

MIT

Citation

If you use this tool in research, please cite:

@software{musicxml_to_pcs,
  author = {Rubini, Mike},
  title = {musicxml-to-pcs: Pitch Class Set Extraction from MusicXML},
  url = {https://github.com/code91/musicxml-to-pcs},
  year = {2025}
}

Contributing

Contributions welcome! Please open an issue or submit a pull request.

Related Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

musicxml_to_pcs-0.1.0.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

musicxml_to_pcs-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file musicxml_to_pcs-0.1.0.tar.gz.

File metadata

  • Download URL: musicxml_to_pcs-0.1.0.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for musicxml_to_pcs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2d434f1a590fd5ea65664c416752cc07ed42def16f0b877bc398ef1cecb76f3c
MD5 481a97d9343488edf0e7a19c2a0f1ab1
BLAKE2b-256 ccf4c965785ad794faddaae99550a3e9cd29bbb929ec5b72a1b44e1725106c15

See more details on using hashes here.

File details

Details for the file musicxml_to_pcs-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for musicxml_to_pcs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 312942983fb72442f77b7f5724e73243dd024b9f71464ed245fb28d482a56b5c
MD5 b01e0638bc8c9dc034e7911218dcc9a0
BLAKE2b-256 1744ff9fb8437daf47d690950522736890d605b7f0a61cc959db3467dbf3fc73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page