Skip to main content

OCR-D framework

Project description

OCR-D/core

Python modules implementing OCR-D specs and related tools

image Docker Image CI Unit Test CI image image image

Gitter chat

Introduction

This repository contains the python packages that form the base for tools within the OCR-D ecosphere.

All packages are also published to PyPI.

Installation

NOTE Unless you want to contribute to OCR-D/core, we recommend installation as part of ocrd_all which installs a complete stack of OCR-D-related software.

The easiest way to install is via pip:

pip install ocrd

# or just the functionality you need, e.g.

pip install ocrd_modelfactory

All Python software released by OCR-D requires Python 3.8 or higher.

NOTE Some OCR-D-Tools (or even test cases) might reveal an unintended behavior if you have specific environment modifications, like:

  • using a custom build of ImageMagick, whose format delegates are different from what OCR-D supposes
  • custom Python logging configurations in your personal account

Command line tools

NOTE: All OCR-D CLI tools support a --help flag which shows usage and supported flags, options and arguments.

ocrd CLI

ocrd-dummy CLI

A minimal OCR-D processor that copies from -I/-input-file-grp to -O/-output-file-grp

Configuration

Almost all behaviour of the OCR-D/core software is configured via CLI options and flags, which can be listed with the --help flag that all CLI support.

Some parts of the software are configured via environment variables:

  • OCRD_METS_CACHING: If set to true, access to the METS file is cached, speeding in-memory search and modification.

  • OCRD_PROFILE: This variable configures the built-in CPU and memory profiling. If empty, no profiling is done. Otherwise expected to contain any of the following tokens:

    • CPU: Enable CPU profiling of processor runs
    • RSS: Enable RSS memory profiling
    • PSS: Enable proportionate memory profiling
  • OCRD_PROFILE_FILE: If set, then the CPU profile is written to this file for later peruse with a analysis tools like snakeviz

  • PATH: Search path for processor executables (affects ocrd process and ocrd resmgr).

  • HOME: Directory to look for ocrd_logging.conf, fallback for unset XDG variables (see below).

  • XDG_CONFIG_HOME: Directory to look for ./ocrd/resources.yml (i.e. ocrd resmgr user database) – defaults to $HOME/.config.

  • XDG_DATA_HOME: Directory to look for ./ocrd-resources/* (i.e. ocrd resmgr data location) – defaults to $HOME/.local/share.

  • OCRD_DOWNLOAD_RETRIES: Number of times to retry failed attempts for downloads of workspace files.

  • OCRD_DOWNLOAD_TIMEOUT: Timeout in seconds for connecting or reading (comma-separated) when downloading.

  • OCRD_METS_CACHING: Whether to enable in-memory storage of OcrdMets data structures for speedup during processing or workspace operations.

  • OCRD_MAX_PROCESSOR_CACHE: Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers.

  • OCRD_NETWORK_SERVER_ADDR_PROCESSING: Default address of Processing Server to connect to (for ocrd network client processing).

  • OCRD_NETWORK_SERVER_ADDR_WORKFLOW: Default address of Workflow Server to connect to (for ocrd network client workflow).

  • OCRD_NETWORK_SERVER_ADDR_WORKSPACE: Default address of Workspace Server to connect to (for ocrd network client workspace).

  • OCRD_NETWORK_RABBITMQ_CLIENT_CONNECT_ATTEMPTS: Number of attempts for a worker to create its queue. Helpful if the rabbitmq-server needs time to be fully started.

Packages

ocrd_utils

Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.

See README for ocrd_utils for further information.

ocrd_models

Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.

See README for ocrd_models for further information.

ocrd_modelfactory

Code to instantiate models from existing data.

See README for ocrd_modelfactory for further information.

ocrd_validators

Schemas and routines for validating BagIt, ocrd-tool.json, workspaces, METS, page, CLI parameters etc.

See README for ocrd_validators for further information.

ocrd_network

Components related to OCR-D Web API

See README for ocrd_network for further information.

ocrd

Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.

Also contains the command line tool ocrd.

See README for ocrd for further information.

bash library

Builds a bash script that can be sourced by other bash scripts to create OCRD-compliant CLI.

See README for bashlib for further information.

Testing

Download assets (make assets)

Test with local files: make test

  • Test with remote assets:
    • make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'

See Also

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrd_utils-2.64.0.tar.gz (327.4 kB view details)

Uploaded Source

Built Distribution

ocrd_utils-2.64.0-py3-none-any.whl (337.6 kB view details)

Uploaded Python 3

File details

Details for the file ocrd_utils-2.64.0.tar.gz.

File metadata

  • Download URL: ocrd_utils-2.64.0.tar.gz
  • Upload date:
  • Size: 327.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for ocrd_utils-2.64.0.tar.gz
Algorithm Hash digest
SHA256 0248344bf54a17fc3ac5947bbdcbc1c6376419e08e4762568d0d4cb77c176176
MD5 da36e25fa4779d55871c97a143016eec
BLAKE2b-256 d6a4b47fff7409c779b9f917463a04593cd45d8795c8f94f1c3150fae328c2fc

See more details on using hashes here.

File details

Details for the file ocrd_utils-2.64.0-py3-none-any.whl.

File metadata

  • Download URL: ocrd_utils-2.64.0-py3-none-any.whl
  • Upload date:
  • Size: 337.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for ocrd_utils-2.64.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2308fa580cd5c83ddc34e5a2d564e21b95d7fec329d71feebf3385033724d55
MD5 5a2a351b629397f29bb237d58625e8c3
BLAKE2b-256 0a873526e4404e71529be60bb920a7e34e8a13ef48a48be2213156f0d5f7b646

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page