Skip to main content

Minimal inference-only version of SheetSage for music transcription

Project description

SheetSage-Infer

Inference-only version of SheetSage for music transcription with vendored Jukebox modules.

PyPI Python 3.10+ License: MIT

AI-powered music transcription system that converts audio to lead sheets (melody + chord symbols) using deep learning models.


๐Ÿ“Œ Overview

SheetSage-Infer is an inference-only version of SheetSage for music transcription, optimized for easy deployment with vendored Jukebox modules.


โœจ Features

  • โœ… Vendored Jukebox Modules - No external Jukebox dependency needed
  • โœ… CPU & GPU Support - Handcrafted features (CPU) or Jukebox embeddings (GPU)
  • โœ… Multiple Export Formats - LilyPond notation, MIDI files, PDF generation
  • โœ… Audio from URLs - Support for YouTube, Bandcamp, and other sources
  • โœ… Simple API - High-level sheetsage() function

๐Ÿš€ Quick Start

Installation

From PyPI:

# Using pip
pip install sheetsage-infer

# Using uv (recommended - faster)
uv pip install sheetsage-infer

# Or add to your project with uv
uv add sheetsage-infer

For Development:

git clone https://github.com/openmirlab/sheetsage-infer.git
cd sheetsage-infer
pip install -e ".[dev]"

Prerequisites

  • Python: โ‰ฅ3.10 (tested on 3.10, 3.11, 3.12)
  • LilyPond (optional, for PDF generation)
    • Linux: sudo apt-get install lilypond
    • macOS: brew install lilypond
    • Windows: Download from lilypond.org

Simple API (Recommended for Python)

from sheetsage.infer import sheetsage
from sheetsage.utils import engrave
from sheetsage.align import create_beat_to_time_fn

# Transcribe audio URL
lead_sheet, segment_beats, segment_beats_times = sheetsage(
    'https://example.com/audio.mp3',
    use_jukebox=False,           # Use fast CPU-based features
    segment_start_hint=30,       # Start at 30 seconds
    segment_end_hint=60,         # End at 60 seconds
    beats_per_minute_hint=120    # Hint for BPM (improves accuracy)
)

# Export to LilyPond
lily_code = lead_sheet.as_lily()
print(lily_code)

# Export to MIDI
beat_to_time_fn = create_beat_to_time_fn(segment_beats, segment_beats_times)
midi_bytes = lead_sheet.as_midi(beat_to_time_fn)

# Save MIDI file
with open('output.mid', 'wb') as f:
    f.write(midi_bytes)

# Generate PDF (requires LilyPond)
pdf_bytes = engrave(lily_code, out_format='pdf')
with open('leadsheet.pdf', 'wb') as f:
    f.write(pdf_bytes)

Using Jukebox Features (Higher Quality, GPU Required)

from sheetsage.infer import sheetsage

# Requires GPU with >=12GB VRAM
lead_sheet, beats, beat_times = sheetsage(
    'audio.mp3',
    use_jukebox=True,  # Use Jukebox embeddings (vendored)
    segment_start_hint=0,
    segment_end_hint=30,
    beats_per_minute_hint=100
)

Note: Jukebox features require GPU with โ‰ฅ12GB VRAM. Vendored modules work without external installation.

Command-Line Interface

# Basic transcription
python -m sheetsage.infer audio.mp3

# With options
python -m sheetsage.infer audio.mp3 \
    --segment_start_hint 30 \
    --segment_end_hint 60 \
    --beats_per_minute_hint 120 \
    --output_dir ./output

# See all options
python -m sheetsage.infer --help

๐Ÿ“‹ Requirements

  • Python: โ‰ฅ3.10
  • PyTorch: โ‰ฅ2.0.0
  • GPU: Optional, but recommended for Jukebox features (12GB+ VRAM)
  • OS: Linux, macOS, Windows

โšก Performance

Transcription speed depends on audio length and feature extraction method:

  • Handcrafted features (CPU): ~1-5 seconds per minute of audio
  • Jukebox features (GPU): ~30-60 seconds per minute of audio (requires GPU with โ‰ฅ12GB VRAM)

Note: Performance depends on audio length, hardware, and feature extraction method. Jukebox features provide higher quality but are slower.


๐Ÿ“š Examples

See examples/ directory for usage examples:

  • basic_transcription.py - Basic usage
  • jukebox_transcription.py - GPU-based transcription
  • hooktheory_example.py - Working with Hooktheory data

๐Ÿ—๏ธ Project Structure

sheetsage-infer/
โ”œโ”€โ”€ sheetsage/                    # Main package
โ”‚   โ”œโ”€โ”€ infer.py                 # Main transcription pipeline
โ”‚   โ”œโ”€โ”€ align.py                 # Beat-to-time alignment
โ”‚   โ”œโ”€โ”€ beat_track.py             # Beat detection
โ”‚   โ”œโ”€โ”€ utils.py                 # LilyPond engraving, audio I/O
โ”‚   โ”œโ”€โ”€ assets.py                 # Asset management
โ”‚   โ”œโ”€โ”€ assets/                   # Asset JSON files
โ”‚   โ”‚   โ”œโ”€โ”€ hooktheory.json
โ”‚   โ”‚   โ”œโ”€โ”€ jukebox.json
โ”‚   โ”‚   โ”œโ”€โ”€ rwc.json
โ”‚   โ”‚   โ”œโ”€โ”€ sheetsage.json
โ”‚   โ”‚   โ””โ”€โ”€ test.json
โ”‚   โ”œโ”€โ”€ modules/                  # Neural network models
โ”‚   โ”‚   โ””โ”€โ”€ modules.py            # Transformer architectures
โ”‚   โ”œโ”€โ”€ representations/          # Feature extractors
โ”‚   โ”‚   โ”œโ”€โ”€ handcrafted.py       # CPU-based mel-spectrograms
โ”‚   โ”‚   โ”œโ”€โ”€ jukebox.py            # Jukebox embedding interface
โ”‚   โ”‚   โ””โ”€โ”€ jukebox_modules/     # Vendored Jukebox code
โ”‚   โ””โ”€โ”€ theory/                   # Music theory classes
โ”‚       โ”œโ”€โ”€ lead_sheet.py         # LeadSheet class with export methods
โ”‚       โ”œโ”€โ”€ basic.py              # Basic music theory primitives
โ”‚       โ”œโ”€โ”€ internal.py           # Internal theory classes
โ”‚       โ”œโ”€โ”€ theorytab.py          # TheoryTab integration
โ”‚       โ””โ”€โ”€ utils.py              # Theory utilities
โ”œโ”€โ”€ examples/                     # Example scripts
โ”‚   โ”œโ”€โ”€ basic_transcription.py    # Basic usage
โ”‚   โ”œโ”€โ”€ jukebox_transcription.py  # GPU-based transcription
โ”‚   โ”œโ”€โ”€ hooktheory_example.py     # Hooktheory data examples
โ”‚   โ”œโ”€โ”€ hooktheory_simple.py     # Simple Hooktheory example
โ”‚   โ””โ”€โ”€ transcribe_hooktheory_segments.py  # Hooktheory segment transcription
โ”œโ”€โ”€ hooktheory_data/              # Test data
โ”‚   โ”œโ”€โ”€ Hooktheory_Test_MIDI.tar.gz
โ”‚   โ””โ”€โ”€ Hooktheory_Test_Segments.json
โ”œโ”€โ”€ docs/                         # Documentation
โ”‚   โ””โ”€โ”€ generated/               # Generated documentation
โ”œโ”€โ”€ .github/                      # GitHub configuration
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ””โ”€โ”€ publish.yml           # PyPI publishing workflow
โ”œโ”€โ”€ pyproject.toml               # Project configuration
โ”œโ”€โ”€ requirements.txt             # Python dependencies
โ”œโ”€โ”€ uv.lock                      # UV lock file
โ”œโ”€โ”€ LICENSE                      # MIT License
โ””โ”€โ”€ README.md                    # This file

๐Ÿ”„ Changes from Original SheetSage

SheetSage-Infer has been modified from the original SheetSage to make it more suitable for library use and easier to maintain.

Key Improvements

Feature Original This Version
Jukebox Dependency External, complex install Vendored, works out of box
Test Coverage Limited Test suite included
Python Support 3.12+ only 3.10, 3.11, 3.12
Build System Hatch Setuptools (standard)
Dependency Pins Loose Explicit versions

What We Maintain

  • โœ… All core transcription functionality
  • โœ… Same neural network models
  • โœ… Same output formats (LeadSheet, LilyPond, MIDI)
  • โœ… Same API interface for sheetsage() function
  • โœ… Same theory classes (Note, Chord, Melody, Harmony, etc.)

What We Changed

  • Vendored Jukebox Modules: Eliminates complex external dependency
  • Library-First Design: Optimized for pip install and programmatic use
  • Better Dependency Management: Explicit version pins and compatibility

๐Ÿ™ Acknowledgments

Original Research by Chris Donahue

SheetSage-Infer is built upon the excellent work of SheetSage by Chris Donahue. The original SheetSage represents a major advancement in music transcription, achieving state-of-the-art results through hierarchical transformer architectures.

Research Paper

SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription

This work introduced hierarchical music transcription with melody and harmony extraction, enabling high-quality lead sheet generation from audio.

Original Author

  • Chris Donahue - Original SheetSage creator

About This Implementation

This package was created to continue the excellent work by providing easier deployment and vendored Jukebox modules, while preserving 100% of the original model quality and algorithms.

What we maintain:

  • PyTorch 2.0+ compatibility
  • Modern dependency management
  • Inference-only packaging

What remains unchanged:

  • All model architectures (100% original)
  • All transcription algorithms (100% original)
  • All model weights (100% original)
  • All output formats (100% original)

๐Ÿ“„ Citation

Please cite using the following bibtex entry:

@inproceedings{donahue2024sheetsage,
  title={SheetSage: A Hierarchical Transformer for Audio to Lead Sheet Transcription},
  author={Donahue, Chris},
  booktitle={ISMIR},
  year={2024}
}

If you use SheetSage-Infer in your research, please cite the original SheetSage paper above. This package is a maintenance fork to ensure easier deployment and continued compatibility - all credit for the models, algorithms, and research belongs to the original author.


๐Ÿ“„ License

MIT License (same as original SheetSage)

Copyright (c) 2024 Chris Donahue (Original SheetSage) Copyright (c) 2025 (SheetSage-Infer modifications)

See LICENSE for details.

This project includes code adapted from SheetSage (MIT License, Copyright 2024 Chris Donahue).


โš ๏ธ Limitations

  • Inference only - No training capabilities
  • Jukebox features require GPU - 12GB+ VRAM recommended for Jukebox embeddings
  • LilyPond required for PDF - Optional dependency for PDF generation
  • Time signatures - Currently supports 4/4 and 3/4 only
  • Audio length - Best results with segments 30-300 seconds

๐Ÿค Contributing

We welcome contributions! Please:

  1. Follow the code style (ruff/black)
  2. Add tests for new features
  3. Submit PRs with clear descriptions

Development Setup

# Install dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Format and lint code
ruff format . && ruff check .

๐Ÿ“ž Support

For issues and questions:


๐Ÿ”— Links


Made with โค๏ธ for the ML community

Based on the excellent work by Chris Donahue and the SheetSage project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openmirlab_sheetsage_infer-0.1.1.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openmirlab_sheetsage_infer-0.1.1-py3-none-any.whl (28.8 kB view details)

Uploaded Python 3

File details

Details for the file openmirlab_sheetsage_infer-0.1.1.tar.gz.

File metadata

File hashes

Hashes for openmirlab_sheetsage_infer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 02c66957e75ade99790ef82da856ed829cb90b2f5786267cb69f6f9ddece7eb8
MD5 595231ce2698a1dc97c42955f5a05635
BLAKE2b-256 4ec61fe450c5555b4135b0c3ee718cbe9b5b714341558921396ec8c1c3a0d180

See more details on using hashes here.

File details

Details for the file openmirlab_sheetsage_infer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for openmirlab_sheetsage_infer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1a311f9c9e2c612236cc3f0308de09f5c777b89170caca0d745618764b3c320d
MD5 7ace55c5ef6af3a86bf06cd0bb32f087
BLAKE2b-256 650c31417f395d1189eb20681f2e3e9a521726a0004af1af81a4b151002d4ead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page