Skip to main content

A python library for extracting information from sheetmusic pdfs

Project description

Hvordan installere sheetmusicengine

1. Klon repo:

git clone https://github.com/sigurdo/sheetmusicengine.git

2. Installer nødvendige programmer

3. Lag virtual environment (må ikke, men det er ryddig og enkelt)

Lage:

python -m venv venv

Aktivere:

source venv/bin/activate

Deaktivere:

deactivate

4. Installer python-pakker

Dette må gjøres når virtual environmentet er aktivert

pip install -r requirements.txt

5. Kjøre koden

Scriptet kjører du med (virtual environment må forstatt være aktivert)

python splitter.py

Det vil nå bli lagd noen undermapper i mappa du kjører det fra. Putt pdfene du vil analysere i mappa som heter input_pdfs og så kjør scriptet på nytt. Scriptet bruker noen få sekunder på hver side som skal analyseres, så det tar relativt lang tid. Når hver pdf er ferdig analysert vil de ligge ferdig splitta i output_pdfs.

Forbedre presisjonen til tesseract

Etter litt research viser det seg at man relativt enkelt kan forbedre presisjonen til tesseract ved å bytte ut OCR engine mode fra legacy til lstm, og laste ned et trent datasett kalt tessdata_best.

  1. Last ned og unzip https://github.com/tesseract-ocr/tessdata_best/archive/refs/tags/4.1.0.zip.
  2. Når du kjører splitter.py legger du til følgende argumenter:
    --use-lstm --tessdata-dir "path/to/tessdata_best"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheatless-0.0.8.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

sheatless-0.0.8-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file sheatless-0.0.8.tar.gz.

File metadata

  • Download URL: sheatless-0.0.8.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.8.tar.gz
Algorithm Hash digest
SHA256 767d39996b9635692cac135c17d24db2ec4c8ba8c2f26ea4840166803903e5ea
MD5 de9753d7b91d11c4555abc577ce98f8e
BLAKE2b-256 cefe5ee9722d05788b9436e7fd2e42a8f8e061b0b2d1a846b307f9e2be6b14c9

See more details on using hashes here.

File details

Details for the file sheatless-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: sheatless-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 3f73a34030077fb768e8ed6d65fc1c507259e22b628d4738156de064faa1327c
MD5 a1fc3cc759c005d477d011376c59e643
BLAKE2b-256 5809d0bfd39cba5ca4718621d2d7bc5a4363586b99630a7de0c1b83f9e50166c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page