Skip to main content

numba-accelerated python midi score processing library.

Project description

numba_midi

python_package

A Numba-accelerated Python library for fast MIDI file reading and music score processing.

This library is implemented entirely in Python, making it portable and easy to modify and extend for Python developers. Efficiency is achieved by using NumPy structured arrays to store data instead of creating per-event or per-note Python class instances. The library leverages NumPy vectorized operations where possible and uses Numba for non-vectorizable operations.

Main features

  • Read and write MIDI files.
  • Pure Python, making its internals more accessible to Python developers.
  • 7x faster on average than pretty_midi for reading a MIDI file from disk.
  • Events (note on and off) and notes (start and duration) representations.
  • Tracks representation based on NumPy arrays, making it trivial to do vectorized operations on all notes in a track.
  • Multiple modes regarding how to process overlapping notes when converting from events to note representation.
  • Conversion to and from piano roll representation.
  • Conversion functions from/to pretty_midi and symusic.
  • Timestamps and durations both in seconds and ticks.

Installation

To install the library, use the following command:

pip install numba_midi

Music Score Interfaces

  • Score: Represents a music score with notes as atomic items, including start times and durations. This approach is more convenient for offline processing compared to handling note-on and note-off events.
  • MidiScore: Mirrors raw MIDI data by using MIDI events, including note-on and note-off events. This class serves as an intermediate representation for reading and writing MIDI files.

Piano Roll

The library includes a PianoRoll dataclass with conversion functions to seamlessly transform between piano rolls and MIDI scores.

Interoperability

We provide functions to convert from/to score from the symusic and pretty_midi libraries in symusic.py and pretty_midi.py respectively.

Overlapping Notes Behavior

MIDI files can contain tracks with notes that overlap in channel, pitch, and time. How to convert these to notes with start times and durations depends on the chosen convention. Ideally, we want to choose the one that matches how the synthesizer will interpret the MIDI events.

For example, for a given channel and pitch, we can have:

tick channel type pitch velocity
100 1 On 80 60
110 1 On 80 60
120 1 On 80 60
120 1 Off 80 0
130 1 Off 80 0
140 1 Off 80 0
150 1 On 80 60
150 1 Off 80 0
160 1 Off 80 0

Should the Off event on tick 120 stop all three notes, the first two notes, or just the first one?
Should the first note stop at tick 110 when we have a new note to avoid any overlap?
Should we create a note with duration 0 or 10 starting on tick 150, or no note at all?
If a note is not closed when we reach the end of the song, should it be discarded, or should we keep it and use the end of the song as the end time?

We provide control to the user on how to handle overlapping notes and zero-length notes through the parameter notes_mode with type NotesMode = Literal["no_overlap", "first_in_first_out", "note_off_stops_all"].

We obtain the same behavior as pretty-midi when using notes_mode="note_off_stops_all" and the same behavior as symusic when using notes_mode="first_in_first_out".

Note: Using "no_overlap" is not as strong as enforcing a monophonic constraint on the instrument: two notes with different pitches can still overlap in time. Although polyphonic, a piano should use "no_overlap" to be realistic.

Benchmark

We measure the loading speed by taking the first 1000 MIDI files (after sorting the paths) from the Lakh matched MIDI dataset. We measure both the time it takes to load from disk and the time it takes to load from raw bytes already in memory if that is available. We take the minimum duration over 10 runs for each file. We compute the loading speed in MB/sec for each file and compute the median values. We ignore the files that could not be loaded when computing the median. The benchmark was executed with Python 3.11.10 on a laptop with an Intel i7-13800H processor clocked at 2.9 GHz.

Library Disk Median MB/s Disk Average MB/s Memory Median MB/s Memory Average MB/s #Failures
numba_midi 3.8 4.1 5.8 5.6 4
symusic 85 86 164 172 4
pretty_midi 0.52 0.52 x x 8

When reading from disk, we are about 7x faster than pretty_midi and 22x slower than symusic.

Note: We could probably get a 2x speedup with a reasonable amount of effort by moving more code to Numba JIT-compiled functions.

Alternatives

Here are some alternative libraries and how they compare to numba_midi:

  • pretty_midi: Implemented using a Python object for each note, making it slow compared to numba_midi.
  • pypianoroll: Focused on piano roll functionalities. It relies on Python loops over notes, which can be slow. It also uses pretty-midi for MIDI file loading, which is not optimized for speed.
  • symusic: Written in C++ and interfaced with PyBind11, making it extremely fast. However, its C++ implementation makes it much harder to extend for Python developers compared to pure Python libraries like numba_midi.
  • muspy: Represents music scores using Python classes, with one Note class instance per note. This design prevents the use of efficient NumPy vectorized operations, relying instead on slower Python loops.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numba_midi-0.1.4.tar.gz (30.0 kB view details)

Uploaded Source

File details

Details for the file numba_midi-0.1.4.tar.gz.

File metadata

  • Download URL: numba_midi-0.1.4.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for numba_midi-0.1.4.tar.gz
Algorithm Hash digest
SHA256 da9b5901f6bb18e67a12fe7c0ef6126562ed44666faf2b9d062079647cdc3475
MD5 8736387a3675095f908bc67a9463080f
BLAKE2b-256 6e22243322224e40915406976e65c4c69a1eced60ca2a34f08e2808a830e768f

See more details on using hashes here.

Provenance

The following attestation bundles were made for numba_midi-0.1.4.tar.gz:

Publisher: python-publish.yml on martinResearch/numba_midi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page