OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding
Project description
ocrd_wrap
OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding
Introduction
This offers OCR-D compliant workspace processors for binarization via Doxa (using its native Python bindings).
It is itself written in Python, and relies heavily on the OCR-D core API. This is responsible for handling METS/PAGE, and providing the OCR-D CLI.
Installation
Create and activate a virtual environment as usual.
To install Python dependencies:
make deps
Which is the equivalent of:
pip install -r requirements.txt
To install this module, then do:
make install
Which is the equivalent of:
pip install .
Usage
OCR-D processor interface ocrd-doxa-binarize
To be used with PAGE-XML documents in an OCR-D annotation workflow.
ocrd-doxa-binarize -h
Usage: ocrd-doxa-binarize [OPTIONS]
binarize via locally adaptive thresholding
Options:
-I, --input-file-grp USE File group(s) used as input
-O, --output-file-grp USE File group(s) used as output
-g, --page-id ID Physical page ID(s) to process
--overwrite Remove existing output pages/images
(with --page-id, remove only those)
-p, --parameter JSON-PATH Parameters, either verbatim JSON string
or JSON file path
-P, --param-override KEY VAL Override a single JSON object key-value pair,
taking precedence over --parameter
-m, --mets URL-PATH URL or file path of METS to process
-w, --working-dir PATH Working directory of local workspace
-l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
Log level
-C, --show-resource RESNAME Dump the content of processor resource RESNAME
-L, --list-resources List names of processor resources
-J, --dump-json Dump tool description as JSON and exit
-h, --help This help message
-V, --version Show version
Parameters:
"dpi" [number - 0]
pixel density in dots per inch (overrides any meta-data in the
images); disabled when zero
"level-of-operation" [string - "page"]
PAGE XML hierarchy level to operate on
Possible values: ["page", "region", "line"]
"algorithm" [string - "ISauvola"]
Thresholding algorithm to use.
Possible values: ["Otsu", "Bernsen", "Niblack", "Sauvola", "Wolf",
"Gatos", "NICK", "Su", "Singh", "Bataineh", "ISauvola", "WAN"]
"parameters" [object - {}]
Dictionary of algorithm-specific parameters. Unless overridden here,
the following defaults are used:
Bernsen: {'window': 75, 'threshold': 100, 'contrast-limit': 25}
NICK: {'window': 75, 'k': -0.2}
Niblack: {'window': 75, 'k': 0.2}
Singh: {'window': 75, 'k', 0.2}
Gatos: {'glyph': 60}
Sauvola: {'window': 75, 'k': 0.2}
Wolf: {'window': 75, 'k': 0.2}
WAN: {'window': 75, 'k': 0.2}
Su: {'window': 0 (based on stroke size),
'minN': windowSize (roughly based on size of window)}
(window/glyph sizes are in px, threshold/limits in uint8 [0,255])
Testing
none yet
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ocrd_doxa-0.0.2.tar.gz
.
File metadata
- Download URL: ocrd_doxa-0.0.2.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20af5a77811f23a59785e786dfad2451b4dd84fed79f2cad48440174eb90dc4c |
|
MD5 | 53462d9fe882e17355dc48563929f152 |
|
BLAKE2b-256 | 2aeddd4abb4f66c974e5164918f1a03e9c6d20eb37229e60717fa4dcba24dace |
File details
Details for the file ocrd_doxa-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: ocrd_doxa-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d4cb4260cf95ddbdcb1aefdfbd8483bd5b0c33850774e6f43a98e8d3fd670e9 |
|
MD5 | 9e8d38b988cbb68d7866906b00a4d4ae |
|
BLAKE2b-256 | 45024886b607d5f663098de59da45522b3ce603a3a102cca7c065464f6becb56 |