OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding
Project description
ocrd_wrap
OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding
Introduction
This offers OCR-D compliant workspace processors for binarization via Doxa (using its native Python bindings).
It is itself written in Python, and relies heavily on the OCR-D core API. This is responsible for handling METS/PAGE, and providing the OCR-D CLI.
Installation
Create and activate a virtual environment as usual.
To install Python dependencies:
make deps
Which is the equivalent of:
pip install -r requirements.txt
To install this module, then do:
make install
Which is the equivalent of:
pip install .
Usage
OCR-D processor interface ocrd-doxa-binarize
To be used with PAGE-XML documents in an OCR-D annotation workflow.
ocrd-doxa-binarize -h
Usage: ocrd-doxa-binarize [OPTIONS]
binarize via locally adaptive thresholding
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show version
Parameters:
"level-of-operation" [string - "page"]
PAGE XML hierarchy level to operate on
Possible values: ["page", "region", "line"]
"algorithm" [string - "ISauvola"]
Thresholding algorithm to use.
Possible values: ["Otsu", "Bernsen", "Niblack", "Sauvola", "Wolf",
"Gatos", "NICK", "Su", "Singh", "Bataineh", "ISauvola", "WAN"]
"parameters" [object - {}]
Dictionary of algorithm-specific parameters. Unless overridden here,
the following defaults are used:
Bernsen: {'window': 75, 'threshold': 100, 'contrast-limit': 25}
NICK: {'window': 75, 'k': -0.2}
Niblack: {'window': 75, 'k': 0.2}
Singh: {'window': 75, 'k', 0.2}
Gatos: {'glyph': 60}
Sauvola: {'window': 75, 'k': 0.2}
Wolf: {'window': 75, 'k': 0.2}
WAN: {'window': 75, 'k': 0.2}
Su: {'window': 0 (based on stroke size),
'minN': windowSize (roughly based on size of window)}
(window/glyph sizes are in px, threshold/limits in uint8 [0,255])
Testing
none yet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ocrd_doxa-0.0.1.tar.gz
(5.7 kB
view hashes)
Built Distribution
Close
Hashes for ocrd_doxa-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 838809d134f6d5493c8137f09d7e3b4b3fe8d3524a23014c3bae09a31bd061c9 |
|
MD5 | 9eea63de58c34f0fe55adb5989b48285 |
|
BLAKE2b-256 | 0f4f29e26a254aed3bf696b77422ef0e9fd3c44615f8cfd6bd22e58d28b00916 |