Skip to main content

A python library for extracting parts from sheetmusic pdfs

Project description

sheatless - A python library for extracting parts from sheetmusic pdfs

Sheatless, a tool for The Beatless to become sheetless. Written and managed by the web-committee in the student orchestra The Beatless. Soon to be integrated in taktlaus.no.

API

PdfPredictor

class PdfPredictor():
    def __init__(
        self,
        pdf : BytesIO | bytes,
        instruments=None,
        instruments_file=None,
        instruments_file_format="yaml",
        use_lstm=False,
        tessdata_dir=None,
        tesseract_languages=["eng"],
        log_stream=sys.stdout,
        crop_to_top=False,
        crop_to_left=True,
        full_score_threshold=3,
        full_score_label="Full score",
        ):
        ...
    
    def parts(self):
        for ...:
            yield  {
                "name": "<part name>",
                "partNumber": "<part number>",
                "instruments": ["<instrument name", ...],
                "fromPage": "<from page>",
                "toPage": "<to page>",
            }

Arguments for __init__:

  • pdf - PDF file object
  • instruments (optional) - Dictionary of instruments. Will override any provided instruments file.
  • instruments_file (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
  • instruments_file_format (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
    • If neither instruments_file nor instruments is provided a default instruments file will be used.
  • use_lstm (optional) - Use LSTM instead of legacy engine mode.
  • tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
  • tesseract_languages (optional) - List of which languages tesseract should use.
  • log_stream (optional) - File stream log output will be sent to. Can be set to None to disable logging.
  • crop_to_top (optional) - If set to True (not default), PDF pages will be cropped to top half.
  • crop_to_left (optional) - If set to True (default), PDF pages will be cropped to left half.
  • full_score_threshold (optional) - If the number of parts predicted in one pages is greater than this number, full_score_label will be considered as the predicted part instead.
  • full_score_label (optional) - The label to use for identifying a full score.

processUploadedPdf

def processUploadedPdf(pdfPath, imagesDirPath, instruments_file=None, instruments=None, use_lstm=False, tessdata_dir=None):
    ...
    return parts, instrumentsDefaultParts

which will be available with

from sheatless import processUploadedPdf

Arguments description here:

Argument Optional Description
pdfPath Full path to PDF file.
imagesDirPath Full path to output images.
instruments_file (optional) Full path to instruments file. Accepted formats: YAML (.yaml, .yml), JSON (.json).
instruments (optional) Dictionary of instruments. Will override any provided instruments file.
If neither instruments_file nor instruments is provided a default instruments file will be used.
use_lstm (optional) Use LSTM instead of legacy engine mode.
tessdata_dir (optional) Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.

Returns description here:

Return Description
parts A list of dictionaries { "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j } describing each part
instrumentsDefaultParts A dictionary { ..., "instrument_i": j, ... }, where j is the index in the parts list for the default part for instrument_i.

predict_parts_in_pdf

def predict_parts_in_pdf(
    pdf : BytesIO | bytes,
    instruments=None,
    instruments_file=None,
    instruments_file_format="yaml",
    use_lstm=False,
    tessdata_dir=None,
    ):
    ...
    return parts, instrumentsDefaultParts

Arguments:

  • pdf - PDF file object
  • instruments (optional) - Dictionary of instruments. Will override any provided instruments file.
  • instruments_file (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
  • instruments_file_format (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
    • If neither instruments_file nor instruments is provided a default instruments file will be used.
  • use_lstm (optional) - Use LSTM instead of legacy engine mode.
  • tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.

Returns:

  • parts - A list of dictionaries { "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j } describing each part
  • instrumentsDefaultParts - A dictionary { ..., "instrument_i": j, ... }, where j is the index in the parts list for the default part for instrument_i.

predict_parts_in_img

def predict_parts_in_img(img : io.BytesIO | bytes | PIL.Image.Image, instruments, use_lstm=False, tessdata_dir=None) -> typing.Tuple[list, list]:
    ...
    return partNames, instrumentses

Arguments:

  • img - image object
  • instruments - dictionary of instruments
  • use_lstm (optional) - Use LSTM instead of legacy engine mode.
  • tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.

Returns:

  • partNames - a list of part names
  • instrumentses - a list of lists of instruments for each part

Example docker setup

Sheatless requires tesseract and poppler installed on the system to work. An example docker setup as well as integration of the library can be found in sheatless-splitter.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheatless-1.9.2.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

sheatless-1.9.2-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file sheatless-1.9.2.tar.gz.

File metadata

  • Download URL: sheatless-1.9.2.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for sheatless-1.9.2.tar.gz
Algorithm Hash digest
SHA256 8acbe03540b3658fc80a0cb90f08c4dc632b4e3f2ca77f1ffb883c288e7abd57
MD5 3052d5928a84f48027903231879552b9
BLAKE2b-256 7a8a77a2b63b741f353e3e5145f87c03c29a2051a92a21260154d51234cefba2

See more details on using hashes here.

File details

Details for the file sheatless-1.9.2-py3-none-any.whl.

File metadata

  • Download URL: sheatless-1.9.2-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10

File hashes

Hashes for sheatless-1.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4d71e75e37d563cd328ee2756d36ffc06c376dd35b8b550349a14fd261c85aab
MD5 6b379cc19de3c156dd85daec2eea63ce
BLAKE2b-256 b62fc2435adf65fc93ca7bc8da438e7f2a537ec00df9d6f63e05f2e469c034a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page