A machine learning pipeline for generating piano sight-reading exercises

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MarlonF24

These details have not been verified by PyPI

Project description

Sightreading AI

A machine learning pipeline for generating piano sight-reading exercises using transformer models and symbolic music processing.

Overview

This project (still WOP, see current development areas later) implements a system for training a machine learning model that shall eventually generate piano sight-reading exercises with control over musical complexity and structure. This combines optical music recognition (OMR), symbolic music processing, custom tokenisation, and transformer-based generation.

Key Features

Flexible Data Pipeline: Multi-stage conversion supporting PDF or direct MIDI input
Custom Tokeniser: Metadata-conditioned REMI tokeniser with piano-specific optimisations
Complexity Metrics: Automated assessment of complexity metrics (e.g. density, rhythmic, and melodic complexity)
Controllable Generation: Generate exercises with specific complexity, structure, and key signatures

Architecture

1. Data Processing Pipeline For Model Training

The pipeline transforms raw musical data through multiple stages with error handling and logging:

From PDF sources:

PDF → MXL → MIDI → Tokens → Model Training → Generation

From MIDI sources:

MIDI → MIDI(processed) → Tokens → Model Training → Generation

Pipeline Components

Pipeline and Converter classes orchestrate the conversion process
PipelineStage objects define each transformation step
Conversion Functions implement specific format transformations with flexible processing modes:
- SingleFileConversionFunction: Processes files individually with granular error tracking
- BatchConversionFunction: Supports both single-file and batch processing for efficiency
- Format implementations:
  - pdf_to_mxl: Audiveris OMR for PDF → MXL conversion (batch-capable for parallel processing)
  - to_midi: music21-based conversion with metadata extraction (single-file for detailed analysis)
  - midi_to_tokens: Custom tokenisation with metadata conditioning (batch-capable for efficiency)

Error Handling & Logging

ConversionOutcome objects track success, warnings, and errors for each file
Log class provides comprehensive statistics and detailed conversion logs
Performance optimisation: Skipping existing files and efficient resource management

Additional Features

Input Validation: File filtering according to musical constraints implied by the tokeniser config
Score Refurbishing: Integrated option to refurbish scores (add missing time/key sig., split into separate exercises, correct errors by Audiveris)
PDF-Preprocessing: split pages to facilitate OMR -> risk of cutting individual exercises into pieces (potentially add resolution enhancement, brightness/contrast adjustment; see https://audiveris.github.io/audiveris/_pages/guides/advanced/improved_input/)
Temporary file saving system: Move files out of the pipeline to separate directories according to outcome

2. Tokenisation & Metadata System

Custom Tokeniser (`MyTokeniser`)

Extends miditok's REMI tokeniser with:

Metadata conditioning tokens (complexity, structure, key signatures)
Piano-optimised vocabulary (removes drums, unused instruments)
BPE training support to enhance semantical value of vocab
Hash-based compatibility checking to ensure that a model only receives data from one tokeniser configuration

Metadata Extraction (`Metadata` class)

Computes musical metrics:

Density Complexity: Note density relative to measure count
Duration Complexity: Rhythmic variety based on note duration diversity
Interval Complexity: Melodic complexity from average interval sizes
Structural Information: Time signatures, key signatures, measure counts, etc.

3. Model Architecture

Custom GPT-2 Implementation (`MyModel`)

Metadata-Conditioned Generation: Uses extracted complexity tokens as conditioning
Smart Loading: Handles model/tokeniser compatibility and version management
Training Pipeline: Integrated with HuggingFace Trainer for training
Sequence Length Analysis: Optimal cutoff determination from training data

Custom Dataset (`MyTokenDataset`)

Tokeniser Compatibility Validation: Filters files by tokeniser hash to ensure data consistency
Automatic Sequence Formatting: Adds BOS/EOS tokens and handles proper label alignment for causal language modeling
Memory-Efficient Loading: Sorts sequences by length to minimize padding waste during batching
Metadata Masking: Sets metadata tokens and special tokens to -100 in labels to exclude from loss calculation
Length Filtering: Automatically excludes sequences exceeding model's maximum position embeddings

Installation & Setup

Prerequisites

Python 3.11+
CUDA-compatible GPU (recommended)
Java Runtime Environment (for Audiveris OMR, if using PDF input)
Audiveris OMR software (if using PDF input)

Installation

Clone Repository:

git clone https://github.com/MarlonF24/sightreading_ai
cd sightreading_ai

Setup a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`

Install PyTorch:
For CPU-Only Users:
```
pip install torch==2.10.0
```
For NVIDIA GPU Users:
- Find your highest compatible CUDA version by running nvidia-smi in your terminal (Top row, look for "CUDA Version").
- Find the appropriate command for your system (use the highest possible CUDA version available but still compatible with your GPU) at https://pytorch.org/get-started/locally/. You can remove the torchvision package from the command. Run the command in your virtual environment.
  Example for CUDA 13.0:
```
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130
```
- You can verify the installation by running the following commands in Python:
```
import torch
print(torch.__version__)  # Check PyTorch version
print(torch.cuda.is_available())  # Check if CUDA is available
print(torch.version.cuda)  # Check the CUDA version PyTorch was built with
print(torch.cuda.get_device_name(0))  # Check GPU name (if available)
```
Install dependencies:

[!IMPORTANT] Especially when using CUDA, ensure that the PyTorch installation step is completed by now. Otherwise the torch installation might not go as desired.

 pip install --upgrade pip
 pip install -r requirements.txt

Configure External Tools (if using PDF input):
- Install Audiveris and note the installation path
- Install languages (English, German) in Audiveris under Tools > Install languages...
- Update paths in data_pipeline/data_pipeline_constants.py (ESP. AUDIVERIS_PATH)

Usage

Quick Start: Complete Training Pipeline (main.py at project root)

from pathlib import Path
from tokeniser.tokeniser import MyTokeniser
from model.model import MyModel
from data_pipeline_scripts.pipeline import construct_music_pipeline
from data_pipeline_scripts.converter import Converter

# 1. Setup tokeniser
tokeniser = MyTokeniser()

# 2. Setup pipeline
pipeline = construct_music_pipeline(tokeniser=tokeniser)
converter = Converter(pipeline=pipeline)

# 3. Convert to MIDI stage (from PDF or place MIDI files directly in midi_start/)
# Option A: From PDF files, place PDF files in data_pipeline/data/pdf_in/
converter.multi_stage_conversion("pdf_in", "midi_in", batch_if_possible=False, overwrite=True, move_successful_inputs_to_temp=True, move_error_inputs_to_temp=True)

# Option B: From MIDI files, place MIDI files in data_pipeline/data/midi_start/

converter.multi_stage_conversion("midi_start", "midi_in", batch_if_possible=False, overwrite=True, move_successful_inputs_to_temp=True, move_error_inputs_to_temp=True)

# 4. Train BPE tokeniser (optional, for vocabulary compression)
tokeniser.train_BPE(data_dir=Path("data_pipeline/data/midi_in"))
tokeniser.save_pretrained("tokeniser/trained_tokeniser") # later load with tokeniser = MyTokeniser.from_pretrained("tokeniser/trained_tokeniser") instead of initializing new MyTokeniser() as above

# 5. Convert MIDI to tokens
converter.multi_stage_conversion("midi_in", "tokens_in", batch_if_possible=False, overwrite=True)

# 6. Train model
MyModel.train_from_tokens_dir(
    tokens_dir=Path("data_pipeline/data/tokens_in"), 
    tokeniser=tokeniser
)

Generating Music (main.py at project root)

from tokeniser.tokeniser import Metadata, MyTokeniser
from model.model import MyModel

# Define musical parameters
metadata = Metadata.TokenisedMetadata(
    time_signature="4/4",
    num_measures=16,
    density_complexity=5,    # 1-10 scale
    duration_complexity=3,   # 1-10 scale  
    interval_complexity=4    # 1-10 scale
)

# Generate exercise
MyModel.generate_tokens(
    metadata_tokens=metadata,
    key_signature=0,  # C major/A minor
    output_dir=Path("data_pipeline/data/tokens_out")
)

from data_pipeline_scripts.pipeline import construct_music_pipeline
from data_pipeline_scripts.converter import MyConverter

# Convert to mxl format
tokeniser = MyTokeniser.from_pretrained("model/training") # load the tokeniser whose tokens the model was trained on, it was saved alognside the model in model/training
converter = MyConverter(pipeline=construct_music_pipeline(tokeniser=tokeniser))

converter.multi_stage_conversion(
    "tokens_out", 
    "mxl_out", 
    batch_if_possible=False, 
    overwrite=True, 
    move_successful_inputs_to_temp=False, 
    move_error_inputs_to_temp=False
)

Project Structure

sightreading_ai/
├── data_pipeline_scripts/          # Data processing pipeline
│   ├── conversion_functions.py     # Format conversion implementations
│   ├── converter.py               # Pipeline orchestration
│   └── pipeline.py                # Pipeline configuration
├── tokeniser/                     # Custom tokenisation system
│   ├── tokeniser.py              # MyTokeniser implementation
│   └── tokeniser_constants.py    # Tokeniser configuration
├── model/                         # Model architecture and training
│   ├── model.py                  # MyModel implementation
│   ├── dataloader.py             # Custom PyTorch dataset
│   └── model_constants.py        # Model configuration
├── constants/                     # Global configuration
└── data_pipeline/                 # Data processing directories (generated by converter initialisation)
    ├── logs/                     # Conversion logs and statistics
    ├── temp/                     # Temporary files during processing
    ├── error_temp/               # Files that failed conversion (for debugging)
    └── data/
        ├── pdf_in/               # Input PDF files
        ├──...
        ├── midi_start/           # Input MIDI files 
        └── tokens_in/            # Tokenised sequences for training

Current Develompent Areas

Finding optimal tokeniser configuration
Finding eligible and sizeable dataset to build prototype
Finetune complexity metrics

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

MarlonF24

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Apr 16, 2026

0.1.1

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sightreading_ai-0.1.2.tar.gz (54.9 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sightreading_ai-0.1.2-py3-none-any.whl (55.6 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file sightreading_ai-0.1.2.tar.gz.

File metadata

Download URL: sightreading_ai-0.1.2.tar.gz
Upload date: Apr 16, 2026
Size: 54.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sightreading_ai-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`468ceb6dc2e767904e0248c9e6d792c1c12febac111a47ff01baf9d8b6374fd1`
MD5	`bd433a2f08d545b3bd901fec558b2ca3`
BLAKE2b-256	`c177575beac36dca8abd325e9cb5449037a3946d5fcd0500be818e10598e14b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sightreading_ai-0.1.2.tar.gz:

Publisher: publish.yml on MarlonF24/sightreading_ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sightreading_ai-0.1.2.tar.gz
- Subject digest: 468ceb6dc2e767904e0248c9e6d792c1c12febac111a47ff01baf9d8b6374fd1
- Sigstore transparency entry: 1321606983
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: MarlonF24/sightreading_ai@ed09a95b29917411a7415c204eab360599b9601c
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/MarlonF24
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ed09a95b29917411a7415c204eab360599b9601c
- Trigger Event: release

File details

Details for the file sightreading_ai-0.1.2-py3-none-any.whl.

File metadata

Download URL: sightreading_ai-0.1.2-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 55.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sightreading_ai-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec1b325c4fc55a452a8a6a64fd2ede6d165e9cd7126433681255647e9ab84e63`
MD5	`13a45c73eff165dff1c84fc6db52f531`
BLAKE2b-256	`d71705f15caa6d6a8c7002363ee7827c122cacce7a31a5a144308486f3133cff`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sightreading_ai-0.1.2-py3-none-any.whl:

Publisher: publish.yml on MarlonF24/sightreading_ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sightreading_ai-0.1.2-py3-none-any.whl
- Subject digest: ec1b325c4fc55a452a8a6a64fd2ede6d165e9cd7126433681255647e9ab84e63
- Sigstore transparency entry: 1321607097
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: MarlonF24/sightreading_ai@ed09a95b29917411a7415c204eab360599b9601c
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/MarlonF24
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ed09a95b29917411a7415c204eab360599b9601c
- Trigger Event: release

sightreading-ai 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sightreading AI

Overview

Key Features

Architecture

1. Data Processing Pipeline For Model Training

Pipeline Components

Error Handling & Logging

Additional Features

2. Tokenisation & Metadata System

Custom Tokeniser (MyTokeniser)

Metadata Extraction (Metadata class)

3. Model Architecture

Custom GPT-2 Implementation (MyModel)

Custom Dataset (MyTokenDataset)

Installation & Setup

Prerequisites

Installation

Usage

Quick Start: Complete Training Pipeline (main.py at project root)

Generating Music (main.py at project root)

Project Structure

Current Develompent Areas

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Custom Tokeniser (`MyTokeniser`)

Metadata Extraction (`Metadata` class)

Custom GPT-2 Implementation (`MyModel`)

Custom Dataset (`MyTokenDataset`)