Skip to main content

A python library for extracting information from sheetmusic pdfs

Project description

Hvordan installere sheetmusicengine

1. Klon repo:

git clone https://github.com/sigurdo/sheetmusicengine.git

2. Installer nødvendige programmer

3. Lag virtual environment (må ikke, men det er ryddig og enkelt)

Lage:

python -m venv venv

Aktivere:

source venv/bin/activate

Deaktivere:

deactivate

4. Installer python-pakker

Dette må gjøres når virtual environmentet er aktivert

pip install -r requirements.txt

5. Kjøre koden

Scriptet kjører du med (virtual environment må forstatt være aktivert)

python splitter.py

Det vil nå bli lagd noen undermapper i mappa du kjører det fra. Putt pdfene du vil analysere i mappa som heter input_pdfs og så kjør scriptet på nytt. Scriptet bruker noen få sekunder på hver side som skal analyseres, så det tar relativt lang tid. Når hver pdf er ferdig analysert vil de ligge ferdig splitta i output_pdfs.

Forbedre presisjonen til tesseract

Etter litt research viser det seg at man relativt enkelt kan forbedre presisjonen til tesseract ved å bytte ut OCR engine mode fra legacy til lstm, og laste ned et trent datasett kalt tessdata_best.

  1. Last ned og unzip https://github.com/tesseract-ocr/tessdata_best/archive/refs/tags/4.1.0.zip.
  2. Når du kjører splitter.py legger du til følgende argumenter:
    --use-lstm --tessdata-dir "path/to/tessdata_best"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheatless-0.0.9.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

sheatless-0.0.9-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file sheatless-0.0.9.tar.gz.

File metadata

  • Download URL: sheatless-0.0.9.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f452f87ab422ecc1344a1cfcdbf7bf14fff0c731a73933eaa4c210db33aa96a1
MD5 030ff7f3bde124825e0534c31a5eeaae
BLAKE2b-256 20e9d03d9ba662caf48665ee734c1ea6f03d05da9b8f01a9d8ecb70976fce8b0

See more details on using hashes here.

File details

Details for the file sheatless-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: sheatless-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for sheatless-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f5ee8df8b9e676d4b01f86a4f8143f9fdaa9e3dbe9489ee21b0e4cc08de21e63
MD5 d425e65712625d8287b291a486a87504
BLAKE2b-256 ef5e285e574178900b4765c341c5a78c5b80cde6fd66307e656026e84eaab6fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page