A python library for extracting information from sheetmusic pdfs
Project description
Hvordan installere sheetmusicengine
1. Klon repo:
git clone https://github.com/sigurdo/sheetmusicengine.git
2. Installer nødvendige programmer
- Python
- Poppler
- Linux:
sudo apt install poppler-utils
- Windows: http://blog.alivate.com.au/poppler-windows/
- Linux:
- Tesseract
3. Lag virtual environment (må ikke, men det er ryddig og enkelt)
Lage:
python -m venv venv
Aktivere:
source venv/bin/activate
Deaktivere:
deactivate
4. Installer python-pakker
Dette må gjøres når virtual environmentet er aktivert
pip install -r requirements.txt
5. Kjøre koden
Scriptet kjører du med (virtual environment må forstatt være aktivert)
python splitter.py
Det vil nå bli lagd noen undermapper i mappa du kjører det fra. Putt pdfene du vil analysere i mappa som heter input_pdfs
og så kjør scriptet på nytt. Scriptet bruker noen få sekunder på hver side som skal analyseres, så det tar relativt lang tid. Når hver pdf er ferdig analysert vil de ligge ferdig splitta i output_pdfs
.
Forbedre presisjonen til tesseract
Etter litt research viser det seg at man relativt enkelt kan forbedre presisjonen til tesseract ved å bytte ut OCR engine mode fra legacy
til lstm
, og laste ned et trent datasett kalt tessdata_best
.
- Last ned og unzip https://github.com/tesseract-ocr/tessdata_best/archive/refs/tags/4.1.0.zip.
- Når du kjører
splitter.py
legger du til følgende argumenter:--use-lstm --tessdata-dir "path/to/tessdata_best"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sheatless-0.0.2.tar.gz
.
File metadata
- Download URL: sheatless-0.0.2.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b8e35b12268fc61280b63e36e96bec12985c21ba84a1a0a8fc29e99cee230c8 |
|
MD5 | 6e798a449b0270cc6983fc433e249692 |
|
BLAKE2b-256 | 5943f3ebbec40a0f062eb2594a7ff42149e8310a024c24a9cf856854902c7775 |
File details
Details for the file sheatless-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: sheatless-0.0.2-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8404286eaf8c11c1d4caf9b0853e01cbb1dd38a815c7e81b3143444710240846 |
|
MD5 | 1c76e7f0ee8c6f66b354ead0f1b064cd |
|
BLAKE2b-256 | 38ed51358c1cc6cb6e216f44a8897ce8e172052924743e5a9994c069e5f275ef |