Minimal inference-only version of SheetSage for music transcription
Project description
SheetSage-Infer
Inference-only version of SheetSage for music transcription with vendored Jukebox modules.
AI-powered music transcription system that converts audio to lead sheets (melody + chord symbols) using deep learning models.
๐ Overview
SheetSage-Infer is an inference-only version of SheetSage for music transcription, optimized for easy deployment with vendored Jukebox modules.
โจ Features
- โ Vendored Jukebox Modules - No external Jukebox dependency needed
- โ CPU & GPU Support - Handcrafted features (CPU) or Jukebox embeddings (GPU)
- โ Multiple Export Formats - LilyPond notation, MIDI files, PDF generation
- โ Audio from URLs - Support for YouTube, Bandcamp, and other sources
- โ
Simple API - High-level
sheetsage()function
๐ Quick Start
Installation
From PyPI:
# Using pip
pip install sheetsage-infer
# Using uv (recommended - faster)
uv pip install sheetsage-infer
# Or add to your project with uv
uv add sheetsage-infer
For Development:
git clone https://github.com/openmirlab/sheetsage-infer.git
cd sheetsage-infer
pip install -e ".[dev]"
Prerequisites
- Python: โฅ3.10 (tested on 3.10, 3.11, 3.12)
- LilyPond (optional, for PDF generation)
- Linux:
sudo apt-get install lilypond - macOS:
brew install lilypond - Windows: Download from lilypond.org
- Linux:
Simple API (Recommended for Python)
from sheetsage.infer import sheetsage
from sheetsage.utils import engrave
from sheetsage.align import create_beat_to_time_fn
# Transcribe audio URL
lead_sheet, segment_beats, segment_beats_times = sheetsage(
'https://example.com/audio.mp3',
use_jukebox=False, # Use fast CPU-based features
segment_start_hint=30, # Start at 30 seconds
segment_end_hint=60, # End at 60 seconds
beats_per_minute_hint=120 # Hint for BPM (improves accuracy)
)
# Export to LilyPond
lily_code = lead_sheet.as_lily()
print(lily_code)
# Export to MIDI
beat_to_time_fn = create_beat_to_time_fn(segment_beats, segment_beats_times)
midi_bytes = lead_sheet.as_midi(beat_to_time_fn)
# Save MIDI file
with open('output.mid', 'wb') as f:
f.write(midi_bytes)
# Generate PDF (requires LilyPond)
pdf_bytes = engrave(lily_code, out_format='pdf')
with open('leadsheet.pdf', 'wb') as f:
f.write(pdf_bytes)
Using Jukebox Features (Higher Quality, GPU Required)
from sheetsage.infer import sheetsage
# Requires GPU with >=12GB VRAM
lead_sheet, beats, beat_times = sheetsage(
'audio.mp3',
use_jukebox=True, # Use Jukebox embeddings (vendored)
segment_start_hint=0,
segment_end_hint=30,
beats_per_minute_hint=100
)
Note: Jukebox features require GPU with โฅ12GB VRAM. Vendored modules work without external installation.
Command-Line Interface
# Basic transcription
python -m sheetsage.infer audio.mp3
# With options
python -m sheetsage.infer audio.mp3 \
--segment_start_hint 30 \
--segment_end_hint 60 \
--beats_per_minute_hint 120 \
--output_dir ./output
# See all options
python -m sheetsage.infer --help
๐ Requirements
- Python: โฅ3.10
- PyTorch: โฅ2.0.0
- GPU: Optional, but recommended for Jukebox features (12GB+ VRAM)
- OS: Linux, macOS, Windows
โก Performance
Transcription speed depends on audio length and feature extraction method:
- Handcrafted features (CPU): ~1-5 seconds per minute of audio
- Jukebox features (GPU): ~30-60 seconds per minute of audio (requires GPU with โฅ12GB VRAM)
Note: Performance depends on audio length, hardware, and feature extraction method. Jukebox features provide higher quality but are slower.
๐ Examples
See examples/ directory for usage examples:
basic_transcription.py- Basic usagejukebox_transcription.py- GPU-based transcriptionhooktheory_example.py- Working with Hooktheory data
๐๏ธ Project Structure
sheetsage-infer/
โโโ sheetsage/ # Main package
โ โโโ infer.py # Main transcription pipeline
โ โโโ align.py # Beat-to-time alignment
โ โโโ beat_track.py # Beat detection
โ โโโ utils.py # LilyPond engraving, audio I/O
โ โโโ assets.py # Asset management
โ โโโ assets/ # Asset JSON files
โ โ โโโ hooktheory.json
โ โ โโโ jukebox.json
โ โ โโโ rwc.json
โ โ โโโ sheetsage.json
โ โ โโโ test.json
โ โโโ modules/ # Neural network models
โ โ โโโ modules.py # Transformer architectures
โ โโโ representations/ # Feature extractors
โ โ โโโ handcrafted.py # CPU-based mel-spectrograms
โ โ โโโ jukebox.py # Jukebox embedding interface
โ โ โโโ jukebox_modules/ # Vendored Jukebox code
โ โโโ theory/ # Music theory classes
โ โโโ lead_sheet.py # LeadSheet class with export methods
โ โโโ basic.py # Basic music theory primitives
โ โโโ internal.py # Internal theory classes
โ โโโ theorytab.py # TheoryTab integration
โ โโโ utils.py # Theory utilities
โโโ examples/ # Example scripts
โ โโโ basic_transcription.py # Basic usage
โ โโโ jukebox_transcription.py # GPU-based transcription
โ โโโ hooktheory_example.py # Hooktheory data examples
โ โโโ hooktheory_simple.py # Simple Hooktheory example
โ โโโ transcribe_hooktheory_segments.py # Hooktheory segment transcription
โโโ hooktheory_data/ # Test data
โ โโโ Hooktheory_Test_MIDI.tar.gz
โ โโโ Hooktheory_Test_Segments.json
โโโ docs/ # Documentation
โ โโโ generated/ # Generated documentation
โโโ .github/ # GitHub configuration
โ โโโ workflows/
โ โโโ publish.yml # PyPI publishing workflow
โโโ pyproject.toml # Project configuration
โโโ requirements.txt # Python dependencies
โโโ uv.lock # UV lock file
โโโ LICENSE # MIT License
โโโ README.md # This file
๐ Changes from Original SheetSage
SheetSage-Infer has been modified from the original SheetSage to make it more suitable for library use and easier to maintain.
Key Improvements
| Feature | Original | This Version |
|---|---|---|
| Jukebox Dependency | External, complex install | Vendored, works out of box |
| Test Coverage | Limited | Test suite included |
| Python Support | 3.12+ only | 3.10, 3.11, 3.12 |
| Build System | Hatch | Setuptools (standard) |
| Dependency Pins | Loose | Explicit versions |
What We Maintain
- โ All core transcription functionality
- โ Same neural network models
- โ Same output formats (LeadSheet, LilyPond, MIDI)
- โ
Same API interface for
sheetsage()function - โ Same theory classes (Note, Chord, Melody, Harmony, etc.)
What We Changed
- Vendored Jukebox Modules: Eliminates complex external dependency
- Library-First Design: Optimized for
pip installand programmatic use - Better Dependency Management: Explicit version pins and compatibility
๐ Acknowledgments
Original Research by Chris Donahue
SheetSage-Infer is built upon the excellent work of SheetSage by Chris Donahue. The original SheetSage represents a major advancement in music transcription, achieving state-of-the-art results through hierarchical transformer architectures.
Research Paper
SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription
This work introduced hierarchical music transcription with melody and harmony extraction, enabling high-quality lead sheet generation from audio.
Original Author
- Chris Donahue - Original SheetSage creator
About This Implementation
This package was created to continue the excellent work by providing easier deployment and vendored Jukebox modules, while preserving 100% of the original model quality and algorithms.
What we maintain:
- PyTorch 2.0+ compatibility
- Modern dependency management
- Inference-only packaging
What remains unchanged:
- All model architectures (100% original)
- All transcription algorithms (100% original)
- All model weights (100% original)
- All output formats (100% original)
๐ Citation
Please cite using the following bibtex entry:
@inproceedings{donahue2024sheetsage,
title={SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription},
author={Donahue, Chris},
booktitle={ISMIR},
year={2024}
}
If you use SheetSage-Infer in your research, please cite the original SheetSage paper above. This package is a maintenance fork to ensure easier deployment and continued compatibility - all credit for the models, algorithms, and research belongs to the original author.
๐ License
MIT License (same as original SheetSage)
Copyright (c) 2024 Chris Donahue (Original SheetSage) Copyright (c) 2025 (SheetSage-Infer modifications)
See LICENSE for details.
This project includes code adapted from SheetSage (MIT License, Copyright 2024 Chris Donahue).
โ ๏ธ Limitations
- Inference only - No training capabilities
- Jukebox features require GPU - 12GB+ VRAM recommended for Jukebox embeddings
- LilyPond required for PDF - Optional dependency for PDF generation
- Time signatures - Currently supports 4/4 and 3/4 only
- Audio length - Best results with segments 30-300 seconds
๐ค Contributing
We welcome contributions! Please:
- Follow the code style (ruff/black)
- Add tests for new features
- Submit PRs with clear descriptions
Development Setup
# Install dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Format and lint code
ruff format . && ruff check .
๐ Support
For issues and questions:
- GitHub Issues: github.com/openmirlab/sheetsage-infer/issues
- Examples:
examples/directory
๐ Links
- Original SheetSage: https://github.com/chrisdonahue/sheetsage
- This Repository: https://github.com/openmirlab/sheetsage-infer
- PyPI Package: https://pypi.org/project/sheetsage-infer/
Made with โค๏ธ for the ML community
Based on the excellent work by Chris Donahue and the SheetSage project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openmirlab_sheetsage_infer-0.1.1.tar.gz.
File metadata
- Download URL: openmirlab_sheetsage_infer-0.1.1.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02c66957e75ade99790ef82da856ed829cb90b2f5786267cb69f6f9ddece7eb8
|
|
| MD5 |
595231ce2698a1dc97c42955f5a05635
|
|
| BLAKE2b-256 |
4ec61fe450c5555b4135b0c3ee718cbe9b5b714341558921396ec8c1c3a0d180
|
File details
Details for the file openmirlab_sheetsage_infer-0.1.1-py3-none-any.whl.
File metadata
- Download URL: openmirlab_sheetsage_infer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 28.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a311f9c9e2c612236cc3f0308de09f5c777b89170caca0d745618764b3c320d
|
|
| MD5 |
7ace55c5ef6af3a86bf06cd0bb32f087
|
|
| BLAKE2b-256 |
650c31417f395d1189eb20681f2e3e9a521726a0004af1af81a4b151002d4ead
|