A python library for extracting parts from sheetmusic pdfs
Project description
sheatless - A python library for extracting parts from sheetmusic pdfs
Sheatless, a tool for The Beatless to become sheetless. Written and managed by the web-committee in the student orchestra The Beatless. Soon to be integrated in taktlaus.no.
API
PdfPredictor
class PdfPredictor():
def __init__(
self,
pdf : BytesIO | bytes,
instruments=None,
instruments_file=None,
instruments_file_format="yaml",
use_lstm=False,
tessdata_dir=None,
log_stream=sys.stdout,
crop_to_top=False,
crop_to_left=True,
full_score_threshold=3,
full_score_label="Full score",
):
...
def parts(self):
for ...:
yield {
"name": "<part name>",
"partNumber": "<part number>",
"instruments": ["<instrument name", ...],
"fromPage": "<from page>",
"toPage": "<to page>",
}
Arguments for __init__
:
pdf
- PDF file objectinstruments
(optional) - Dictionary of instruments. Will override any provided instruments file.instruments_file
(optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .jsoninstruments_file_format
(optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json- If neither instruments_file nor instruments is provided a default instruments file will be used.
use_lstm
(optional) - Use LSTM instead of legacy engine mode.tessdata_dir
(optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.log_stream
(optional) - File stream log output will be sent to. Can be set toNone
to disable logging.crop_to_top
(optional) - If set toTrue
(not default), PDF pages will be cropped to top half.crop_to_left
(optional) - If set toTrue
(default), PDF pages will be cropped to left half.full_score_threshold
(optional) - If the number of parts predicted in one pages is greater than this number,full_score_label
will be considered as the predicted part instead.full_score_label
(optional) - The label to use for identifying a full score.
processUploadedPdf
def processUploadedPdf(pdfPath, imagesDirPath, instruments_file=None, instruments=None, use_lstm=False, tessdata_dir=None):
...
return parts, instrumentsDefaultParts
which will be available with
from sheatless import processUploadedPdf
Arguments description here:
Argument | Optional | Description |
---|---|---|
pdfPath | Full path to PDF file. | |
imagesDirPath | Full path to output images. | |
instruments_file | (optional) | Full path to instruments file. Accepted formats: YAML (.yaml, .yml), JSON (.json). |
instruments | (optional) | Dictionary of instruments. Will override any provided instruments file. |
If neither instruments_file nor instruments is provided a default instruments file will be used. | ||
use_lstm | (optional) | Use LSTM instead of legacy engine mode. |
tessdata_dir | (optional) | Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used. |
Returns description here:
Return | Description |
---|---|
parts | A list of dictionaries { "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j } describing each part |
instrumentsDefaultParts | A dictionary { ..., "instrument_i": j, ... } , where j is the index in the parts list for the default part for instrument_i . |
predict_parts_in_pdf
def predict_parts_in_pdf(
pdf : BytesIO | bytes,
instruments=None,
instruments_file=None,
instruments_file_format="yaml",
use_lstm=False,
tessdata_dir=None,
):
...
return parts, instrumentsDefaultParts
Arguments:
- pdf - PDF file object
- instruments (optional) - Dictionary of instruments. Will override any provided instruments file.
- instruments_file (optional) - Full path to instruments file or instruments file object. Accepted extensions: .yaml, .yml, .json
- instruments_file_format (optional) - Format of instruments_file if it is a file object. Accepted formats: yaml, json
- If neither instruments_file nor instruments is provided a default instruments file will be used.
- use_lstm (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
Returns:
- parts - A list of dictionaries
{ "name": "name", "instruments": ["instrument 1", "instrument 2"...] "fromPage": i, "toPage": j }
describing each part - instrumentsDefaultParts - A dictionary
{ ..., "instrument_i": j, ... }
, where j is the index in the parts list for the default part for instrument_i.
predict_parts_in_img
def predict_parts_in_img(img : io.BytesIO | bytes | PIL.Image.Image, instruments, use_lstm=False, tessdata_dir=None) -> typing.Tuple[list, list]:
...
return partNames, instrumentses
Arguments:
- img - image object
- instruments - dictionary of instruments
- use_lstm (optional) - Use LSTM instead of legacy engine mode.
- tessdata_dir (optional) - Full path to tessdata directory. If not provided, whatever the environment variable TESSDATA_DIR will be used.
Returns:
- partNames - a list of part names
- instrumentses - a list of lists of instruments for each part
Example docker setup
Sheatless requires tesseract and poppler installed on the system to work. An example docker setup as well as integration of the library can be found in sheatless-splitter.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sheatless-1.8.0.tar.gz
(23.2 kB
view details)
Built Distribution
sheatless-1.8.0-py3-none-any.whl
(23.3 kB
view details)
File details
Details for the file sheatless-1.8.0.tar.gz
.
File metadata
- Download URL: sheatless-1.8.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d15728a35bf7f2508ee1b1d9c4d1cb4a191e665526732090943627e1e9a7b3b3 |
|
MD5 | 49105e9dce5774b0fbdc2dcd1ebb588d |
|
BLAKE2b-256 | 061dedf3a57bfe6b6f4472c4165e48f363b9b5ddcd33a95b02ac03f88e23ef46 |
File details
Details for the file sheatless-1.8.0-py3-none-any.whl
.
File metadata
- Download URL: sheatless-1.8.0-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d9b9b573c6076c1457a795c126f8ad4603a77d2c6a4755afe7dbb5a02e9e99c |
|
MD5 | bc548ac2281ff5384e6d1e648ead3676 |
|
BLAKE2b-256 | f824588a0cf59e880ba32d5074b9ff04e69e084619c692eb5e9ea70d9156c1bb |