A machine learning pipeline for generating piano sight-reading exercises
Project description
Sightreading AI
A machine learning pipeline for generating piano sight-reading exercises using transformer models and symbolic music processing.
Overview
This project (still WOP, see current development areas later) implements a system for training a machine learning model that shall eventually generate piano sight-reading exercises with control over musical complexity and structure. This combines optical music recognition (OMR), symbolic music processing, custom tokenisation, and transformer-based generation.
Key Features
- Flexible Data Pipeline: Multi-stage conversion supporting PDF or direct MIDI input
- Custom Tokeniser: Metadata-conditioned REMI tokeniser with piano-specific optimisations
- Complexity Metrics: Automated assessment of complexity metrics (e.g. density, rhythmic, and melodic complexity)
- Controllable Generation: Generate exercises with specific complexity, structure, and key signatures
Architecture
1. Data Processing Pipeline For Model Training
The pipeline transforms raw musical data through multiple stages with error handling and logging:
From PDF sources:
PDF → MXL → MIDI → Tokens → Model Training → Generation
From MIDI sources:
MIDI → MIDI(processed) → Tokens → Model Training → Generation
Pipeline Components
PipelineandConverterclasses orchestrate the conversion processPipelineStageobjects define each transformation step- Conversion Functions implement specific format transformations with flexible processing modes:
SingleFileConversionFunction: Processes files individually with granular error trackingBatchConversionFunction: Supports both single-file and batch processing for efficiency- Format implementations:
pdf_to_mxl: Audiveris OMR for PDF → MXL conversion (batch-capable for parallel processing)to_midi: music21-based conversion with metadata extraction (single-file for detailed analysis)midi_to_tokens: Custom tokenisation with metadata conditioning (batch-capable for efficiency)
Error Handling & Logging
ConversionOutcomeobjects track success, warnings, and errors for each fileLogclass provides comprehensive statistics and detailed conversion logs- Performance optimisation: Skipping existing files and efficient resource management
Additional Features
- Input Validation: File filtering according to musical constraints implied by the tokeniser config
- Score Refurbishing: Integrated option to refurbish scores (add missing time/key sig., split into separate exercises, correct errors by Audiveris)
- PDF-Preprocessing: split pages to facilitate OMR -> risk of cutting individual exercises into pieces (potentially add resolution enhancement, brightness/contrast adjustment; see https://audiveris.github.io/audiveris/_pages/guides/advanced/improved_input/)
- Temporary file saving system: Move files out of the pipeline to separate directories according to outcome
2. Tokenisation & Metadata System
Custom Tokeniser (MyTokeniser)
Extends miditok's REMI tokeniser with:
- Metadata conditioning tokens (complexity, structure, key signatures)
- Piano-optimised vocabulary (removes drums, unused instruments)
- BPE training support to enhance semantical value of vocab
- Hash-based compatibility checking to ensure that a model only receives data from one tokeniser configuration
Metadata Extraction (Metadata class)
Computes musical metrics:
- Density Complexity: Note density relative to measure count
- Duration Complexity: Rhythmic variety based on note duration diversity
- Interval Complexity: Melodic complexity from average interval sizes
- Structural Information: Time signatures, key signatures, measure counts, etc.
3. Model Architecture
Custom GPT-2 Implementation (MyModel)
- Metadata-Conditioned Generation: Uses extracted complexity tokens as conditioning
- Smart Loading: Handles model/tokeniser compatibility and version management
- Training Pipeline: Integrated with HuggingFace Trainer for training
- Sequence Length Analysis: Optimal cutoff determination from training data
Custom Dataset (MyTokenDataset)
- Tokeniser Compatibility Validation: Filters files by tokeniser hash to ensure data consistency
- Automatic Sequence Formatting: Adds BOS/EOS tokens and handles proper label alignment for causal language modeling
- Memory-Efficient Loading: Sorts sequences by length to minimize padding waste during batching
- Metadata Masking: Sets metadata tokens and special tokens to -100 in labels to exclude from loss calculation
- Length Filtering: Automatically excludes sequences exceeding model's maximum position embeddings
Installation & Setup
Prerequisites
- Python 3.11+
- CUDA-compatible GPU (recommended)
- Java Runtime Environment (for Audiveris OMR, if using PDF input)
- Audiveris OMR software (if using PDF input)
Installation
-
Clone Repository:
git clone https://github.com/MarlonF24/sightreading_ai cd sightreading_ai
-
Setup a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate`
-
Install PyTorch:
For CPU-Only Users:pip install torch==2.10.0
For NVIDIA GPU Users:
- Find your highest compatible CUDA version by running
nvidia-smiin your terminal (Top row, look for "CUDA Version"). - Find the appropriate command for your system (use the highest possible CUDA version available but still compatible with your GPU) at https://pytorch.org/get-started/locally/. You can remove the torchvision package from the command. Run the command in your virtual environment.
Example for CUDA 13.0:pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130
- You can verify the installation by running the following commands in Python:
import torch print(torch.__version__) # Check PyTorch version print(torch.cuda.is_available()) # Check if CUDA is available print(torch.version.cuda) # Check the CUDA version PyTorch was built with print(torch.cuda.get_device_name(0)) # Check GPU name (if available)
- Find your highest compatible CUDA version by running
-
Install dependencies:
[!IMPORTANT] Especially when using CUDA, ensure that the PyTorch installation step is completed by now. Otherwise the torch installation might not go as desired.
pip install --upgrade pip
pip install -r requirements.txt
- Configure External Tools (if using PDF input):
- Install Audiveris and note the installation path
- Install languages (English, German) in Audiveris under Tools > Install languages...
- Update paths in
data_pipeline/data_pipeline_constants.py(ESP.AUDIVERIS_PATH)
Usage
Quick Start: Complete Training Pipeline (main.py at project root)
from pathlib import Path
from tokeniser.tokeniser import MyTokeniser
from model.model import MyModel
from data_pipeline_scripts.pipeline import construct_music_pipeline
from data_pipeline_scripts.converter import Converter
# 1. Setup tokeniser
tokeniser = MyTokeniser()
# 2. Setup pipeline
pipeline = construct_music_pipeline(tokeniser=tokeniser)
converter = Converter(pipeline=pipeline)
# 3. Convert to MIDI stage (from PDF or place MIDI files directly in midi_start/)
# Option A: From PDF files, place PDF files in data_pipeline/data/pdf_in/
converter.multi_stage_conversion("pdf_in", "midi_in", batch_if_possible=False, overwrite=True, move_successful_inputs_to_temp=True, move_error_inputs_to_temp=True)
# Option B: From MIDI files, place MIDI files in data_pipeline/data/midi_start/
converter.multi_stage_conversion("midi_start", "midi_in", batch_if_possible=False, overwrite=True, move_successful_inputs_to_temp=True, move_error_inputs_to_temp=True)
# 4. Train BPE tokeniser (optional, for vocabulary compression)
tokeniser.train_BPE(data_dir=Path("data_pipeline/data/midi_in"))
tokeniser.save_pretrained("tokeniser/trained_tokeniser") # later load with tokeniser = MyTokeniser.from_pretrained("tokeniser/trained_tokeniser") instead of initializing new MyTokeniser() as above
# 5. Convert MIDI to tokens
converter.multi_stage_conversion("midi_in", "tokens_in", batch_if_possible=False, overwrite=True)
# 6. Train model
MyModel.train_from_tokens_dir(
tokens_dir=Path("data_pipeline/data/tokens_in"),
tokeniser=tokeniser
)
Generating Music (main.py at project root)
from tokeniser.tokeniser import Metadata, MyTokeniser
from model.model import MyModel
# Define musical parameters
metadata = Metadata.TokenisedMetadata(
time_signature="4/4",
num_measures=16,
density_complexity=5, # 1-10 scale
duration_complexity=3, # 1-10 scale
interval_complexity=4 # 1-10 scale
)
# Generate exercise
MyModel.generate_tokens(
metadata_tokens=metadata,
key_signature=0, # C major/A minor
output_dir=Path("data_pipeline/data/tokens_out")
)
from data_pipeline_scripts.pipeline import construct_music_pipeline
from data_pipeline_scripts.converter import MyConverter
# Convert to mxl format
tokeniser = MyTokeniser.from_pretrained("model/training") # load the tokeniser whose tokens the model was trained on, it was saved alognside the model in model/training
converter = MyConverter(pipeline=construct_music_pipeline(tokeniser=tokeniser))
converter.multi_stage_conversion(
"tokens_out",
"mxl_out",
batch_if_possible=False,
overwrite=True,
move_successful_inputs_to_temp=False,
move_error_inputs_to_temp=False
)
Project Structure
sightreading_ai/
├── data_pipeline_scripts/ # Data processing pipeline
│ ├── conversion_functions.py # Format conversion implementations
│ ├── converter.py # Pipeline orchestration
│ └── pipeline.py # Pipeline configuration
├── tokeniser/ # Custom tokenisation system
│ ├── tokeniser.py # MyTokeniser implementation
│ └── tokeniser_constants.py # Tokeniser configuration
├── model/ # Model architecture and training
│ ├── model.py # MyModel implementation
│ ├── dataloader.py # Custom PyTorch dataset
│ └── model_constants.py # Model configuration
├── constants/ # Global configuration
└── data_pipeline/ # Data processing directories (generated by converter initialisation)
├── logs/ # Conversion logs and statistics
├── temp/ # Temporary files during processing
├── error_temp/ # Files that failed conversion (for debugging)
└── data/
├── pdf_in/ # Input PDF files
├──...
├── midi_start/ # Input MIDI files
└── tokens_in/ # Tokenised sequences for training
Current Develompent Areas
- Finding optimal tokeniser configuration
- Finding eligible and sizeable dataset to build prototype
- Finetune complexity metrics
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sightreading_ai-0.1.2.tar.gz.
File metadata
- Download URL: sightreading_ai-0.1.2.tar.gz
- Upload date:
- Size: 54.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
468ceb6dc2e767904e0248c9e6d792c1c12febac111a47ff01baf9d8b6374fd1
|
|
| MD5 |
bd433a2f08d545b3bd901fec558b2ca3
|
|
| BLAKE2b-256 |
c177575beac36dca8abd325e9cb5449037a3946d5fcd0500be818e10598e14b3
|
Provenance
The following attestation bundles were made for sightreading_ai-0.1.2.tar.gz:
Publisher:
publish.yml on MarlonF24/sightreading_ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sightreading_ai-0.1.2.tar.gz -
Subject digest:
468ceb6dc2e767904e0248c9e6d792c1c12febac111a47ff01baf9d8b6374fd1 - Sigstore transparency entry: 1321606983
- Sigstore integration time:
-
Permalink:
MarlonF24/sightreading_ai@ed09a95b29917411a7415c204eab360599b9601c -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/MarlonF24
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ed09a95b29917411a7415c204eab360599b9601c -
Trigger Event:
release
-
Statement type:
File details
Details for the file sightreading_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sightreading_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 55.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec1b325c4fc55a452a8a6a64fd2ede6d165e9cd7126433681255647e9ab84e63
|
|
| MD5 |
13a45c73eff165dff1c84fc6db52f531
|
|
| BLAKE2b-256 |
d71705f15caa6d6a8c7002363ee7827c122cacce7a31a5a144308486f3133cff
|
Provenance
The following attestation bundles were made for sightreading_ai-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on MarlonF24/sightreading_ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sightreading_ai-0.1.2-py3-none-any.whl -
Subject digest:
ec1b325c4fc55a452a8a6a64fd2ede6d165e9cd7126433681255647e9ab84e63 - Sigstore transparency entry: 1321607097
- Sigstore integration time:
-
Permalink:
MarlonF24/sightreading_ai@ed09a95b29917411a7415c204eab360599b9601c -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/MarlonF24
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ed09a95b29917411a7415c204eab360599b9601c -
Trigger Event:
release
-
Statement type: