Skip to main content

Handwritten music image synthesizer for HMR

Project description

Mashcima

A Python library that produces synthetic images of monophonic handwritten music. One of the following two images is synthetic, take a guess:

Comparison image

The upper image is taken from the CVC-MUSCIMA dataset [3] and the lower image was synthesized by this tool.

The aim of this tool is to provide abundant training data for researchers in the field of handwritten music recognition. It works by taking symbol masks from the MUSCIMA++ dataset [2] and placing them onto a blank staff according to a given annotation. This annotation may be your own, may be generated randomly, or may be taken from the PrIMuS dataset [4].

This tool is described in detail in the article Synthesizing Training Data for Handwritten Music Recognition [1] which can be downloaded here: <not-yet-published>

The article describes various experiments. These experiments are part of the old repository: https://github.com/Jirka-Mayer/BachelorThesis

Installation

The tool can be installed via pip:

pip install mashcima

It will automatically download all the necessary datasets upon first usage, so expect additional 1.5 GB of disk space to be taken up. Also have an internet connection ready (~350MB will be downloaded).

Note: Check out the uninstallation section on how to clear this data.

Usage (short tutorial)

To visualise the generated images, we will use the matplotlib package:

pip install matplotlib

Now we can write a few lines of code to generate an image from a given Mashcima annotation:

import mashcima as mc
import matplotlib.pyplot as plt

# the input annotation
annotation = "h-5 ( ) e=-5 =s=-4 #-3 =s-3 s=-2 =s=-1 N0 =s=0 #1 =s1 | " + \
    "h2 ( ) e=2 =s=3 #4 =s4 s=5 =s=6 N7 =s=7 #8 =s8 | q9 qr hr"

# turn the annotation into an image
img = mc.synthesize_for_training(annotation)

# display the image
plt.imshow(img)
plt.show()

Example 1

The synthesize_for_training function generates an image with additional affine distortion and random gaps between notes. This method should be used for generating training images. We can also generate images that are primarily good looking:

# constraint symbol repository to the writer 01
# (from the MUSCIMA++ dataset, used as the source of symbols)
mc.use_only_writer_number_one()

# synthesize with no distortions
img = mc.synthesize_for_beauty(annotation)

Example 2

We told the synthesizer to use only symbols taken from the writer 01 of the MUSCIMA++ dataset. This writer has the best looking handwriting. The synthesize_for_beauty synthesizes without any awkward looking distortions.

We can also synthesize the surrounding staves of music, so that the resulting image looks like it was cropped from a music sheet:

# render the surrounding staves as well
img = mc.synthesize_for_beauty(
    above_annotation=annotation,
    main_annotation=annotation,
    below_annotation=annotation,
)

Example 3

Now that we can turn an annotation into an image, we need to obtain these annotations from somewhere. One option is to generate them pseudo-randomly:

img = mc.synthesize_for_beauty(
    mc.generate_random_annotation()
)

Example 4

Another option is to load the PrIMuS dataset and use its annotations as a source:

print(mc.load_primus_as_mashcima_annotations(take=5))

This function returns the following data:

[
    {
        'path': './package_aa/000104290-1_1_1/000104290-1_1_1.agnostic',
        'primus': 'clef.C-L1\taccidental.flat-L4\tdigit.2-L4\td...',
        'mashcima': 'clef.C-4 b2 time.2 time.2 h3 h2 | h1 * q0 ...'
    },
    { ... },
    { ... },
    { ... },
    { ... }
]

As you can see, it loads individual incipits and provides them in both the PrIMuS agnostic encoding and the Mashcima encoding. We can use these annotations to generate images:

# you can omit the "take=5" argument to load the whole dataset
data = mc.load_primus_as_mashcima_annotations(take=5)

img = mc.synthesize_for_beauty(
    data[0]["mashcima"]
)

Example 5

If you have your own music in the PrIMuS agnostic format, you can convert it to Mashcima encoding via:

# prints "clef.G-2 #4 #1 time.C"
print(mc.convert_primus_annotation_to_mashcima_annotation(
    "clef.G-L2\taccidental.sharp-L5\taccidental.sharp-S3\tmetersign.C-L3"
))

The conversion is not perfect, for example grace notes are skipped and multi-measure rests cause the whole conversion to fail, returning None. But most notes and rests are converted correctly. See the primus_adapter.py source code.

Synthesis in detail

Both synthesizing functions (synthesize_for_training and synthesize_for_beauty) internally use the synthesize function. This function has many arguments that tweak the synthesis process:

Argument Default Description
main_annotation --- The string annotation for the main staff to render.
above_annotation,
below_annotation
None Optional annotations for staves above and below the main one.
main_canvas_options,
above_canvas_options,
below_canvas_options
None CanvasOptions objects to be passed to the rendering process of each of the three staves.
min_width 0 Typically, the image is cropped horizontally to the width of its content. You can specify minimal width below which the image will not be cropped.
crop_horizontally,
crop_vertically
True The image is rendered onto a large image with all three staves and then cropped to only the main staff. You can disable this cropping in each direction.
transform_image True A random affine distortion may be applied to the rendered image.
symbol_repository None The repository from which to obtain symbols. None stands for the default repository.

To learn more about the synthesis process, read the article: <not-yet-published>.

Uninstallation

Since Mashcima downloaded two datasets and cached some temporary data for faster loading, you need to run the following command before uninstalling the pip package to remove this data:

python -m mashcima.delete_files

Now you can do:

pip uninstall mashcima

License

The sourcecode of this tool falls under the MIT license.

Since the synthesis uses the MUSCIMA++ dataset, which in turn uses the CVC-MUSCIMA dataset, it means that the synthesized images fall under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The PrIMuS dataset is created from the RISM dataset, which falls under the Creative Commons Attribution 3.0 Unported License, and so this license should also apply to images synthesized from PrIMuS annotations.

The CVC-MUSCIMA dataset is for non-commercial research purposes only, which implies that the same restrictions apply to images generated by Mashcima.

Reference:
If you publish material based on this tool, we request you to include a reference to paper [1]. Due to the transitive usage of MUSCIMA++ and CVC-MUSCIMA, you should also include references to [2] and [3]. If you use the PrIMuS dataset, you should also cite [4].

References

[1] <not-yet-published> Jiří Mayer and Pavel Pecina. Synthesizing Training Data for Handwritten Music Recognition. 16th International Conference on Document Analysis and Recognition, ICDAR 2021. Lausanne, September 8-10, pp. <not-yet-known>, 2021.

[2] Jan Hajič jr. and Pavel Pecina. The MUSCIMA++ Dataset for Handwritten Optical Music Recognition. 14th International Conference on Document Analysis and Recognition, ICDAR 2017. Kyoto, Japan, November 13-15, pp. 39-46, 2017.

[3] Alicia Fornés, Anjan Dutta, Albert Gordo, Josep Lladós. CVC-MUSCIMA: A Ground-truth of Handwritten Music Score Images for Writer Identification and Staff Removal. International Journal on Document Analysis and Recognition, Volume 15, Issue 3, pp 243-251, 2012. (DOI: 10.1007/s10032-011-0168-2).

[4] Jorge Calvo-Zaragoza, David Rizo. End-to-End Neural Optical Music Recognition of Monophonic Scores. Applied Sciences 8(4), 606 (2018)

Contact

Jiří Mayer (mayer@ufal.mff.cuni.cz)

This tool has also been created thanks to Pavel Pecina and Jan Hajič jr.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mashcima-1.0.0.tar.gz (61.0 kB view details)

Uploaded Source

Built Distribution

mashcima-1.0.0-py3-none-any.whl (81.9 kB view details)

Uploaded Python 3

File details

Details for the file mashcima-1.0.0.tar.gz.

File metadata

  • Download URL: mashcima-1.0.0.tar.gz
  • Upload date:
  • Size: 61.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.8.10

File hashes

Hashes for mashcima-1.0.0.tar.gz
Algorithm Hash digest
SHA256 61cc1daeb0bdf56895e36cf6d272c0c86443824ef44542102c4fff5fbc8fda59
MD5 708b4e59a6eca0528d78aaf0b9dd96e5
BLAKE2b-256 5dded22d336cd4d896e02f65960d9c30d4618740dbc174f7f989346402b6e83b

See more details on using hashes here.

File details

Details for the file mashcima-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mashcima-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 81.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.40.0 CPython/3.8.10

File hashes

Hashes for mashcima-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0073e855e38bad142ca4a5266262afbc4f77825b42c7bbace2a7f845a36af95f
MD5 36ba1bdd4b220837f51f523b1d34c578
BLAKE2b-256 faf96e6523d77575a64356da2a2dc180254f69c37a2a9972b18d6bf560f30461

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page