Skip to main content

End-to-end Optical Music Recognition (OMR) system build on top of vision transformers.

Project description

homr

homr is an Optical Music Recognition (OMR) software designed to transform camera pictures of sheet music into machine-readable MusicXML format. The resulting MusicXML files can be further processed using tools such as musescore.

Open In Colab

Prerequisites

  • Python 3.10
  • Poetry
  • Optional: NVidia GPU with CUDA 12.1

Getting started

  • Clone the repository
  • Install dependencies for:
    • GPU (requires CUDA): poetry install --only main,gpu
    • CPU: poetry install --only main
    • Development: poetry install
  • Run the program using poetry run homr <image>
  • The resulting MusicXML file will be saved in the same directory as the input image
  • To combine the MusicXML results from multiple images, you can use relieur

Example

The example below provides an overview of the current performance of the implementation. While some errors are present in the output, the overall structure remains accurate.

Original Image homr Result
Go to https://github.com/liebharc/homr if this image isn't displayed

The homr result is obtained by processing the homr output and rendering it with musescore.

Limitations

The current implementation focuses on pitch and rhythm information on the bass or treble clef, neglecting dynamics, articulation, double sharps/flats, and other musical symbols.

Technical Details

homr employs segmentation techniques outlined in oemer to identify staff lines, clefs, bar lines, and note heads in an image. These components are combined to determine the position of staffs within the picture.

Subsequently, each staff image undergoes transformation using a transformer model (based on Polyphonic-TrOMR) to identify symbols present on the staff. Pitch information is cross-validated with note head data obtained from the segmentation model.

The results are then converted into MusicXML format and saved to disk.

Image Predictions

homr utilizes oemer's UNet implementations to isolate staff lines and other symbols for note head identification. These predictions serve as input for staff and symbol detection.

Preprocessing the image has shown to enhance robustness against noisy backgrounds and variations in brightness.

Staff and Symbol Detection

The detection process involves extracting model data types from the image predictions. A key concept is the "staff anchor," which serves as a reference point ensuring accurate staff detection amidst symbols that might obscure it. Clefs and bar lines are currently utilized as anchor symbols.

For each anchor, the algorithm attempts to locate five staff lines and constructs the remainder of the staff around these anchors.

Unit Sizes

The unit size denotes the distance between staff lines, which may vary due to camera perspective. To accommodate this, the unit size is calculated per staff.

Connecting Staffs

Support for multiple voices and grand staffs is facilitated by identifying braces and brackets to combine individual staffs.

Rhythm Parsing

Dewarped images of each staff are computed and passed through a transformer to extract staff contents. From this point onward, semantic information from the sheet music is utilized rather than pixel-based data.

XML Generation

The previous outputs in terms of result model objects are used to generate music XML.

Citation

If you use this code in your research work, please cite oemer and Polyphonic-TrOMR.

Name

The name "homr" stands for Homer's Optical Music Recognition (OMR), leaving the interpretation of "Homer" to the user's discretion, whether referring to the ancient poet Homer or the iconic character from The Simpsons.

Thanks

This project builds upon previous work, including:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

homr-0.6.2.tar.gz (77.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

homr-0.6.2-py3-none-any.whl (86.9 kB view details)

Uploaded Python 3

File details

Details for the file homr-0.6.2.tar.gz.

File metadata

  • Download URL: homr-0.6.2.tar.gz
  • Upload date:
  • Size: 77.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for homr-0.6.2.tar.gz
Algorithm Hash digest
SHA256 84ef12c695a27325187ceb8ef8cdeb71e0412f68573d2c45c8665d5d89d79e22
MD5 e62d9d8ff7977afa8e09ecfe9eeceb06
BLAKE2b-256 bb18f65d8795c1955e13e6cc462cfa27ac9ce8a141705ad86688e634ff8be2f6

See more details on using hashes here.

File details

Details for the file homr-0.6.2-py3-none-any.whl.

File metadata

  • Download URL: homr-0.6.2-py3-none-any.whl
  • Upload date:
  • Size: 86.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for homr-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7e870b7bad00d08223e08189ac52988d86b0a83adef10f62cf4e7faa9681c64
MD5 8060a17bfce44e3eb60a0e8a0db6f5de
BLAKE2b-256 8380a8e70049c42c1cb78f8fd66e92b1a34e216e874c679358819f98ddb8332b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page