Skip to main content

Calculate, search, and analyze MIDI-to-MIDI similarity at scale

Project description

midisim

Calculate, search, and analyze MIDI-to-MIDI similarity at scale

midisim

Main features

  • Ultra-fast and flexible GPU/CPU MIDI-to-MIDI similarity calculation, search and analysis
  • Quality pre-trained models and comprehensive pre-computed embeddings sets
  • Stand-alone, versatile, and extensive codebase for general or custom MIDI-to-MIDI similarity tasks
  • Full cross-platform compatibility and support

Pre-trained models

  • midisim_small_pre_trained_model_2_epochs_43117_steps_0.3148_loss_0.9229_acc.pth - Very fast and accurate small model, suitable for all tasks. This model is included in PyPI package or it can be downloaded from Hugging Face
  • midisim_large_pre_trained_model_2_epochs_86275_steps_0.2054_loss_0.9385_acc.pth - Fast large model for more nuanced embeddings generation. Download checkpoint from Hugging Face

Both pre-trained models were trained on full Godzilla Piano dataset for 2 complete epochs


Pre-computed embeddings sets

For small pre-trained model

discover_midi_dataset_37292_genres_midis_embeddings_cc_by_nc_sa.npy - 37292 genre MIDIs embeddings for genre (artist and song) identification tasks

discover_midi_dataset_202400_identified_midis_embeddings_cc_by_nc_sa.npy - 202400 identified MIDIs embeddings for MIDI identification tasks

discover_midi_dataset_3480123_clean_midis_embeddings_cc_by_nc_sa.npy - 3480123 select clean MIDIs embeddings for large scale similarity search and analysis tasks

For large pre-trained model

discover_midi_dataset_37303_genres_midis_embeddings_large_cc_by_nc_sa.npy - 37303 genre MIDIs embeddings for genre (artist and song) identification tasks

discover_midi_dataset_202400_identified_midis_embeddings_large_cc_by_nc_sa.npy - 202400 identified MIDIs embeddings for MIDI identification tasks

discover_midi_dataset_3480123_clean_midis_embeddings_large_cc_by_nc_sa.npy - 3480123 select clean MIDIs embeddings for large scale similarity search and analysis tasks

Pre-computed embeddings MIDI source dataset: Discover MIDI Dataset


Installation

midisim PyPI package (for general use)

!pip install -U midisim

x-transformers 2.3.1 (for raw/custom tasks)

!pip install x-transformers==2.3.1

Basic use guide

General use example

# ================================================================================================
# Initalize midisim
# ================================================================================================

# Import main midisim module
import midisim

# ================================================================================================
# Prepare midisim embeddings
# ================================================================================================

# Option 1: Download sample pre-computed embeddings corpus from Hugging Face
emb_path = midisim.download_embeddings()

# Option 2: use custom pre-computed embeddings corpus
# See custom embeddings generation section of this README for details
# emb_path = './custom_midis_embeddings_corpus.npy'

# Load downloaded embeddings corpus
corpus_midi_names, corpus_emb = midisim.load_embeddings(emb_path)

# ================================================================================================
# Prepare midisim model
# ================================================================================================

# Option 1: Download main pre-trained midisim model from Hugging Face
model_path = midisim.download_model()

# Option 2: Use main pre-trained midisim model included in midisim PyPI package
# model_path = get_package_models()[0]['path']

# Load midisim model
model, ctx, dtype = midisim.load_model(model_path)

# ================================================================================================
# Prepare source MIDI
# ================================================================================================

# Load source MIDI
input_toks_seqs = midisim.midi_to_tokens('Come To My Window.mid')

# ================================================================================================
# Calculate and analyze embeddings
# ================================================================================================

# Compute source/query embeddings
query_emb = midisim.get_embeddings_bf16(model, input_toks_seqs)

# Calculate cosine similarity between source/query MIDI embeddings and embeddings corpus
idxs, sims = midisim.cosine_similarity_topk(query_emb, corpus_emb)

# ================================================================================================
# Processs, print and save results
# ================================================================================================

# Convert the results to sorted list with transpose values
idxs_sims_tvs_list = midisim.idxs_sims_to_sorted_list(idxs, sims)

# Print corpus matches (and optionally) convert the final result to a handy list for further processing
corpus_matches_list  midisim.print_sorted_idxs_sims_list(idxs_sims_tvs_list, corpus_midi_names, return_as_list=True)

# ================================================================================================
# Copy matched MIDIs from the MIDI corpus for listening and further evaluation and analysis
# ================================================================================================

# Copy matched corpus MIDI to a desired directory for easy evaluation and analysis
out_dir_path = copy_corpus_files(corpus_matches_list)

# ================================================================================================

Raw/custom use example

Small model (2 epochs)

import torch
from x_transformers import TransformerWrapper, Encoder

# Original model hyperparameters
SEQ_LEN = 3072

MASK_IDX     = 384 # Use this value for masked modelling
PAD_IDX      = 385 # Model pad index
VOCAB_SIZE   = 386 # Total vocab size

MASK_PROB    = 0.15 # Original training mask probability value (use for masked modelling)

DEVICE = 'cuda' # You can use any compatible device or CPU
DTYPE  = torch.bfloat16 # Original training dtype

# Official main midisim model checkpoint name
MODEL_CKPT = 'midisim_small_pre_trained_model_2_epochs_43117_steps_0.3148_loss_0.9229_acc.pth'

# Model architecture using x-transformers
model = TransformerWrapper(
    num_tokens = VOCAB_SIZE,
    max_seq_len = SEQ_LEN,
    attn_layers = Encoder(
        dim   = 512,
        depth = 8,
        heads = 8,
        rotary_pos_emb = True,
        attn_flash = True,
    ),
)

model.load_state_dict(torch.load(MODEL_CKPT, map_location=DEVICE))

model.to(DEVICE)
model.eval()

# Original training autoxast setup
autocast_ctx = torch.amp.autocast(device_type=DEVICE, dtype=DTYPE)

Large model (2 epochs)

import torch
from x_transformers import TransformerWrapper, Encoder

# Original model hyperparameters
SEQ_LEN = 3072

MASK_IDX     = 384 # Use this value for masked modelling
PAD_IDX      = 385 # Model pad index
VOCAB_SIZE   = 386 # Total vocab size

MASK_PROB    = 0.15 # Original training mask probability value (use for masked modelling)

DEVICE = 'cuda' # You can use any compatible device or CPU
DTYPE  = torch.bfloat16 # Original training dtype

# Official main midisim model checkpoint name
MODEL_CKPT = 'midisim_large_pre_trained_model_2_epochs_86275_steps_0.2054_loss_0.9385_acc.pth'

# Model architecture using x-transformers
model = TransformerWrapper(
    num_tokens = VOCAB_SIZE,
    max_seq_len = SEQ_LEN,
    attn_layers = Encoder(
        dim   = 512,
        depth = 16,
        heads = 8,
        rotary_pos_emb = True,
        attn_flash = True,
    ),
)

model.load_state_dict(torch.load(MODEL_CKPT, map_location=DEVICE))

model.to(DEVICE)
model.eval()

# Original training autoxast setup
autocast_ctx = torch.amp.autocast(device_type=DEVICE, dtype=DTYPE)

Creating custom MIDI corpus embeddings

# ================================================================================================

# Load main midisim module
import midisim

# Import helper modules
import os
import tqdm

# ================================================================================================

# Call included TMIDIX module through midisim to create MIDI files list
custom_midi_corpus_file_names = midisim.TMIDIX.create_files_list(['./custom_midi_corpus_dir/'])

# ================================================================================================

# Create two lists: one with MIDI corpus file names 
# and another with MIDI corpus tokens representations suitable for embeddings generation
midi_corpus_file_names = []
midi_corpus_tokens = []

for midi_file in tqdm.tqdm(custom_midi_corpus_file_names):
    midi_corpus_file_names.append(os.path.splitext(os.path.basename(midi_file))[0])
    
    midi_tokens = midisim.midi_to_tokens(midi_file, transpose_factor=0, verbose=False)[0]
    midi_corpus_tokens.append(midi_tokens)

# It is highly recommended to sort the resulting corpus by tokens sequence length
# This greatly speeds up embeddings calculations
sorted_midi_corpus = sorted(zip(midi_corpus_file_names, midi_corpus_tokens), key=lambda x: len(x[1]))
midi_corpus_file_names, midi_corpus_tokens = map(list, zip(*sorted_midi_corpus))

# ================================================================================================
# Now you are ready to generate embeddings as follows:
# ================================================================================================

# Load main midisim model
model, ctx, dtype = midisim.load_model(verbose=False)

# Generate MIDI corpus embeddings
midi_corpus_embeddings = midisim.get_embeddings_bf16(model, midi_corpus_tokens)

# ================================================================================================

# Save generated MIDI corpus embeddings and MIDI corpus file names in one handy NumPy file
midisim.save_embeddings(midi_corpus_file_names,
                        midi_corpus_embeddings,
                        verbose=False
                       )

# ================================================================================================

# You now can use this saved custom MIDI corpus NumPy file with midisim.load_embeddings()
# and the rest of the pipeline outlined in the general use section above

Main functions reference list

  • midisim.midisim.copy_corpus_filesCopy or synchronize MIDI corpus files from a source directory to a target corpus location.
  • midisim.midisim.cosine_similarity_topkCompute cosine similarities between a query embedding and a set of embeddings and return the top‑K matches.
  • midisim.midisim.download_all_embeddingsDownload an entire embeddings dataset snapshot from a Hugging Face dataset repository to a local directory.
  • midisim.midisim.download_embeddingsDownload a single precomputed embeddings .npy file from a Hugging Face dataset repository.
  • midisim.midisim.download_modelDownload a pre-trained model checkpoint file from a Hugging Face model repository to a local directory.
  • midisim.midisim.get_embeddings_bf16Load or convert embeddings into bfloat16 format for memory-efficient inference on supported hardware.
  • midisim.midisim.idxs_sims_to_sorted_listConvert parallel index and similarity arrays into a single sorted list of (index, similarity) pairs ordered by similarity.
  • midisim.midisim.load_embeddingsLoad a saved NumPy embeddings file and return the arrays of MIDI names and corresponding embedding vectors.
  • midisim.midisim.load_modelConstruct a Transformer model, load weights from a checkpoint, move it to the requested device, and return the model with an AMP autocast context and dtype.
  • midisim.midisim.masked_mean_poolCompute a masked mean pooling over sequence embeddings, ignoring padded positions via a boolean or numeric mask.
  • midisim.midisim.midi_to_tokensConvert a single-track MIDI file into one or more compact integer token sequences (with optional transpositions) suitable for model input.
  • midisim.midisim.pad_and_maskPad a batch of variable-length token sequences to a common length and produce an attention/mask tensor indicating real tokens vs padding.
  • midisim.midisim.print_sorted_idxs_sims_listPretty-print a sorted list of (index, similarity) pairs, optionally annotating entries with filenames or metadata.
  • midisim.midisim.save_embeddingsSave a list of name strings and their corresponding embedding vectors into a structured NumPy array and optionally persist it to disk.

Limitations

  • Current code and models support only MIDI music elements similarity (start-times, durations and pitches)
  • MIDI channels, instruments, velocities and drums similarites are not currently supported due to complexity and practicality considerations
  • Current pre-trained models are limited by 3k sequence length (~1000 MIDI music notes) so long running MIDIs can only be analyzed in chunks
  • Solo drum track MIDIs are not currently supported and can't be analyzed

Citations

@misc{project_los_angeles_2025,
	author       = { Project Los Angeles },
	title        = { midisim (Revision 707e311) },
	year         = 2025,
	url          = { https://huggingface.co/projectlosangeles/midisim },
	doi          = { 10.57967/hf/7383 },
	publisher    = { Hugging Face }
}
@misc{project_los_angeles_2025,
	author       = { Project Los Angeles },
	title        = { midisim-embeddings (Revision 8ebb453) },
	year         = 2025,
	url          = { https://huggingface.co/datasets/projectlosangeles/midisim-embeddings },
	doi          = { 10.57967/hf/7382 },
	publisher    = { Hugging Face }
}
@misc{project_los_angeles_2025,
	author       = { Project Los Angeles },
	title        = { Discover-MIDI-Dataset (Revision 0eaecb5) },
	year         = 2025,
	url          = { https://huggingface.co/datasets/projectlosangeles/Discover-MIDI-Dataset },
	doi          = { 10.57967/hf/7361 },
	publisher    = { Hugging Face }
}

Project Los Angeles

Tegridy Code 2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

midisim-25.12.44.tar.gz (99.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

midisim-25.12.44-py3-none-any.whl (99.4 MB view details)

Uploaded Python 3

File details

Details for the file midisim-25.12.44.tar.gz.

File metadata

  • Download URL: midisim-25.12.44.tar.gz
  • Upload date:
  • Size: 99.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for midisim-25.12.44.tar.gz
Algorithm Hash digest
SHA256 578053e4dcba3334b2038c5adb3c1bb96a6806400fc7a243b4158ce039dac79d
MD5 2527113e54287aca86e603c4eace62a5
BLAKE2b-256 dc969f93b1ffee9bb9c72283012f5932eabaea6af03700b730a34a6f069634db

See more details on using hashes here.

File details

Details for the file midisim-25.12.44-py3-none-any.whl.

File metadata

  • Download URL: midisim-25.12.44-py3-none-any.whl
  • Upload date:
  • Size: 99.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for midisim-25.12.44-py3-none-any.whl
Algorithm Hash digest
SHA256 0215aa5d9cc5d5e51b367aeae5ad138689247c3ec6f0dc9c70b1175dc0bca568
MD5 54234fd1517839cdec16553fb8c4c57c
BLAKE2b-256 e2fcac8c89243b5a17128bde3db38de0e0b208f96c11c7ac920876fd4d5a6b73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page