Skip to main content

A python library for extracting information from sheetmusic pdfs

Project description

Hvordan installere sheetmusicengine

1. Klon repo:

git clone https://github.com/sigurdo/sheetmusicengine.git

2. Installer nødvendige programmer

3. Lag virtual environment (må ikke, men det er ryddig og enkelt)

Lage:

python -m venv venv

Aktivere:

source venv/bin/activate

Deaktivere:

deactivate

4. Installer python-pakker

Dette må gjøres når virtual environmentet er aktivert

pip install -r requirements.txt

5. Kjøre koden

Scriptet kjører du med (virtual environment må forstatt være aktivert)

python splitter.py

Det vil nå bli lagd noen undermapper i mappa du kjører det fra. Putt pdfene du vil analysere i mappa som heter input_pdfs og så kjør scriptet på nytt. Scriptet bruker noen få sekunder på hver side som skal analyseres, så det tar relativt lang tid. Når hver pdf er ferdig analysert vil de ligge ferdig splitta i output_pdfs.

Forbedre presisjonen til tesseract

Etter litt research viser det seg at man relativt enkelt kan forbedre presisjonen til tesseract ved å bytte ut OCR engine mode fra legacy til lstm, og laste ned et trent datasett kalt tessdata_best.

  1. Last ned og unzip https://github.com/tesseract-ocr/tessdata_best/archive/refs/tags/4.1.0.zip.
  2. Når du kjører splitter.py legger du til følgende argumenter:
    --use-lstm --tessdata-dir "path/to/tessdata_best"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheatless-0.0.3.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

sheatless-0.0.3-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file sheatless-0.0.3.tar.gz.

File metadata

  • Download URL: sheatless-0.0.3.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.3.tar.gz
Algorithm Hash digest
SHA256 947d62fda57b7d82bc0913764383c1589f31188fba029f5ae6d9ef5d9c3fae66
MD5 0037b211171499695d4a92d071a38cba
BLAKE2b-256 a57108a7762a8bcb8951dccfba0c1155a6fbe5f5939d92d2bfcb85fc9feb5cc1

See more details on using hashes here.

File details

Details for the file sheatless-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: sheatless-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5eb0db0431134f9df6dc169d62cda2cca6b46a6616da6797357010af8ea96a80
MD5 caeaceced7aff0ba6cda85c6029f0023
BLAKE2b-256 0c7e6f0b306bb9fffaacb2e409697e118c8b9f1c540afe6c2df3b827abc56dbb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page