Skip to main content

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding

Project description

PyPI version

ocrd_wrap

OCR-D wrapper for DoxaPy image binarization via locally adaptive thresholding

Introduction

This offers OCR-D compliant workspace processors for binarization via Doxa (using its native Python bindings).

It is itself written in Python, and relies heavily on the OCR-D core API. This is responsible for handling METS/PAGE, and providing the OCR-D CLI.

Installation

Create and activate a virtual environment as usual.

To install Python dependencies:

make deps

Which is the equivalent of:

pip install -r requirements.txt

To install this module, then do:

make install

Which is the equivalent of:

pip install .

Usage

OCR-D processor interface ocrd-doxa-binarize

To be used with PAGE-XML documents in an OCR-D annotation workflow.

ocrd-doxa-binarize -h

Usage: ocrd-doxa-binarize [OPTIONS]

  binarize via locally adaptive thresholding

Options:
  -I, --input-file-grp USE        File group(s) used as input
  -O, --output-file-grp USE       File group(s) used as output
  -g, --page-id ID                Physical page ID(s) to process
  --overwrite                     Remove existing output pages/images
                                  (with --page-id, remove only those)
  -p, --parameter JSON-PATH       Parameters, either verbatim JSON string
                                  or JSON file path
  -P, --param-override KEY VAL    Override a single JSON object key-value pair,
                                  taking precedence over --parameter
  -m, --mets URL-PATH             URL or file path of METS to process
  -w, --working-dir PATH          Working directory of local workspace
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -C, --show-resource RESNAME     Dump the content of processor resource RESNAME
  -L, --list-resources            List names of processor resources
  -J, --dump-json                 Dump tool description as JSON and exit
  -h, --help                      This help message
  -V, --version                   Show version

Parameters:
   "dpi" [number - 0]
    pixel density in dots per inch (overrides any meta-data in the
    images); disabled when zero
   "level-of-operation" [string - "page"]
    PAGE XML hierarchy level to operate on
    Possible values: ["page", "region", "line"]
   "algorithm" [string - "ISauvola"]
    Thresholding algorithm to use.
    Possible values: ["Otsu", "Bernsen", "Niblack", "Sauvola", "Wolf",
    "Gatos", "NICK", "Su", "Singh", "Bataineh", "ISauvola", "WAN"]
   "parameters" [object - {}]
    Dictionary of algorithm-specific parameters. Unless overridden here,
    the following defaults are used:
	Bernsen:        {'window': 75, 'threshold': 100, 'contrast-limit': 25}
	NICK:           {'window': 75, 'k': -0.2}
	Niblack:        {'window': 75, 'k': 0.2}
	Singh:          {'window': 75, 'k', 0.2}
	Gatos:          {'glyph': 60}
	Sauvola:        {'window': 75, 'k': 0.2}
	Wolf:           {'window': 75, 'k': 0.2}
	WAN:            {'window': 75, 'k': 0.2}
	Su:             {'window': 0 (based on stroke size), 
                     'minN':  windowSize (roughly based on size of window)}

   (window/glyph sizes are in px, threshold/limits in uint8 [0,255])

Testing

none yet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_doxa-0.0.2.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

ocrd_doxa-0.0.2-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file ocrd_doxa-0.0.2.tar.gz.

File metadata

  • Download URL: ocrd_doxa-0.0.2.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for ocrd_doxa-0.0.2.tar.gz
Algorithm Hash digest
SHA256 20af5a77811f23a59785e786dfad2451b4dd84fed79f2cad48440174eb90dc4c
MD5 53462d9fe882e17355dc48563929f152
BLAKE2b-256 2aeddd4abb4f66c974e5164918f1a03e9c6d20eb37229e60717fa4dcba24dace

See more details on using hashes here.

File details

Details for the file ocrd_doxa-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ocrd_doxa-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/57.0.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.9

File hashes

Hashes for ocrd_doxa-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7d4cb4260cf95ddbdcb1aefdfbd8483bd5b0c33850774e6f43a98e8d3fd670e9
MD5 9e8d38b988cbb68d7866906b00a4d4ae
BLAKE2b-256 45024886b607d5f663098de59da45522b3ce603a3a102cca7c065464f6becb56

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page