Skip to main content

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding

Project description

PyPI version

ocrd_wrap

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding

Introduction

This offers OCR-D compliant workspace processors for binarization via Doxa (using its native Python bindings).

It is itself written in Python, and relies heavily on the OCR-D core API. This is responsible for handling METS/PAGE, and providing the OCR-D CLI.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Usage

OCR-D processor interface ocrd-doxa-binarize

To be used with PAGE-XML documents in an OCR-D annotation workflow.

ocrd-doxa-binarize -h

Usage: ocrd-doxa-binarize [OPTIONS]

  binarize via locally adaptive thresholding

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "level-of-operation" [string - "page"]
    PAGE XML hierarchy level to operate on
    Possible values: ["page", "region", "line"]
   "algorithm" [string - "ISauvola"]
    Thresholding algorithm to use.
    Possible values: ["Otsu", "Bernsen", "Niblack", "Sauvola", "Wolf",
    "Gatos", "NICK", "Su", "Singh", "Bataineh", "ISauvola", "WAN"]
   "parameters" [object - {}]
    Dictionary of algorithm-specific parameters. Unless overridden here,
    the following defaults are used:
	Bernsen:        {'window': 75, 'threshold': 100, 'contrast-limit': 25}
	NICK:           {'window': 75, 'k': -0.2}
	Niblack:        {'window': 75, 'k': 0.2}
	Singh:          {'window': 75, 'k', 0.2}
	Gatos:          {'glyph': 60}
	Sauvola:        {'window': 75, 'k': 0.2}
	Wolf:           {'window': 75, 'k': 0.2}
	WAN:            {'window': 75, 'k': 0.2}
	Su:             {'window': 0 (based on stroke size), 
                     'minN':  windowSize (roughly based on size of window)}

   (window/glyph sizes are in px, threshold/limits in uint8 [0,255])

Testing

none yet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_doxa-0.0.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

ocrd_doxa-0.0.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file ocrd_doxa-0.0.1.tar.gz.

File metadata

  • Download URL: ocrd_doxa-0.0.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2

File hashes

Hashes for ocrd_doxa-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c38ebd153c633a4d0f5cef2decf2ccd969aa46b0150ef6ce4a1546173ab987de
MD5 5e8b11fb8ec6592e3fdb90548669d98f
BLAKE2b-256 a3b095205f198700d7bda3fa995a37bb832ad9cbb9d4e8905ab5a93a59d015bd

See more details on using hashes here.

File details

Details for the file ocrd_doxa-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: ocrd_doxa-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2

File hashes

Hashes for ocrd_doxa-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 838809d134f6d5493c8137f09d7e3b4b3fe8d3524a23014c3bae09a31bd061c9
MD5 9eea63de58c34f0fe55adb5989b48285
BLAKE2b-256 0f4f29e26a254aed3bf696b77422ef0e9fd3c44615f8cfd6bd22e58d28b00916

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page