Skip to main content

A python library for extracting information from sheetmusic pdfs

Project description

Hvordan installere sheetmusicengine

1. Klon repo:

git clone https://github.com/sigurdo/sheetmusicengine.git

2. Installer nødvendige programmer

3. Lag virtual environment (må ikke, men det er ryddig og enkelt)

Lage:

python -m venv venv

Aktivere:

source venv/bin/activate

Deaktivere:

deactivate

4. Installer python-pakker

Dette må gjøres når virtual environmentet er aktivert

pip install -r requirements.txt

5. Kjøre koden

Scriptet kjører du med (virtual environment må forstatt være aktivert)

python splitter.py

Det vil nå bli lagd noen undermapper i mappa du kjører det fra. Putt pdfene du vil analysere i mappa som heter input_pdfs og så kjør scriptet på nytt. Scriptet bruker noen få sekunder på hver side som skal analyseres, så det tar relativt lang tid. Når hver pdf er ferdig analysert vil de ligge ferdig splitta i output_pdfs.

Forbedre presisjonen til tesseract

Etter litt research viser det seg at man relativt enkelt kan forbedre presisjonen til tesseract ved å bytte ut OCR engine mode fra legacy til lstm, og laste ned et trent datasett kalt tessdata_best.

  1. Last ned og unzip https://github.com/tesseract-ocr/tessdata_best/archive/refs/tags/4.1.0.zip.
  2. Når du kjører splitter.py legger du til følgende argumenter:
    --use-lstm --tessdata-dir "path/to/tessdata_best"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheatless-0.0.10.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

sheatless-0.0.10-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file sheatless-0.0.10.tar.gz.

File metadata

  • Download URL: sheatless-0.0.10.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.10.tar.gz
Algorithm Hash digest
SHA256 6d412b28224ddce5d2656835c9abf3dde9cb43752d41eff04127a3ebbd16b9ff
MD5 4a6c350c9814855a37d69cd80767d096
BLAKE2b-256 2fe412e5c220fc2286e13aeb90009608c02c2b3c850995a2269cd7514010696c

See more details on using hashes here.

File details

Details for the file sheatless-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: sheatless-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 e4a0f0371e76e497c0c87e63829eda5d8528cc097c12ab70225a43da20925105
MD5 58b3cabcb70e23288657d392376533e7
BLAKE2b-256 fcc3bb8667ac462edf8db7fdc44681e39e94c246c6c6d0ec2fcfd6bc519703de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page