OCR-D framework
Project description
OCR-D/core
Python modules implementing OCR-D specs and related tools
Introduction
This repository contains the python packages that form the base for tools within the OCR-D ecosphere.
All packages are also published to PyPI.
Installation
NOTE Unless you want to contribute to OCR-D/core, we recommend installation as part of ocrd_all which installs a complete stack of OCR-D-related software.
The easiest way to install is via pip
:
pip install ocrd
# or just the functionality you need, e.g.
pip install ocrd_modelfactory
All python software released by OCR-D requires Python 3.7 or higher.
NOTE Some OCR-D-Tools (or even test cases) might reveal an unintended behavior if you have specific environment modifications, like:
- using a custom build of ImageMagick, whose format delegates are different from what OCR-D supposes
- custom Python logging configurations in your personal account
Command line tools
NOTE: All OCR-D CLI tools support a --help
flag which shows usage and
supported flags, options and arguments.
ocrd
CLI
ocrd-dummy
CLI
A minimal OCR-D processor that copies from -I/-input-file-grp
to -O/-output-file-grp
Configuration
Almost all behaviour of the OCR-D/core software is configured via CLI options and flags, which can be listed with the --help
flag that all CLI support.
Some parts of the software are configured via environement variables:
-
OCRD_METS_CACHING
: If set totrue
, access to the METS file is cached, speeding in-memory search and modification. -
OCRD_PROFILE
: This variable configures the built-in CPU and memory profiling. If empty, no profiling is done. Otherwise expected to contain any of the following tokens:CPU
: Enable CPU profiling of processor runsRSS
: Enable RSS memory profilingPSS
: Enable proportionate memory profiling
-
OCRD_PROFILE_FILE
: If set, then the CPU profile is written to this file for later peruse with a analysis tools like snakeviz -
PATH
: Search path for processor executables (affectsocrd process
andocrd resmgr
). -
HOME
: Directory to look forocrd_logging.conf
, fallback for unset XDG variables (see below). -
XDG_CONFIG_HOME
: Directory to look for./ocrd/resources.yml
(i.e.ocrd resmgr
user database) – defaults to$HOME/.config
. -
XDG_DATA_HOME
: Directory to look for./ocrd-resources/*
(i.e.ocrd resmgr
data location) – defaults to$HOME/.local/share
. -
OCRD_DOWNLOAD_RETRIES
: Number of times to retry failed attempts for downloads of workspace files. -
OCRD_DOWNLOAD_TIMEOUT
: Timeout in seconds for connecting or reading (comma-separated) when downloading. -
OCRD_METS_CACHING
: Whether to enable in-memory storage of OcrdMets data structures for speedup during processing or workspace operations. -
OCRD_MAX_PROCESSOR_CACHE
: Maximum number of processor instances (for each set of parameters) to be kept in memory (including loaded models) for processing workers or processor servers. -
OCRD_NETWORK_SERVER_ADDR_PROCESSING
: Default address of Processing Server to connect to (forocrd network client processing
). -
OCRD_NETWORK_SERVER_ADDR_WORKFLOW
: Default address of Workflow Server to connect to (forocrd network client workflow
). -
OCRD_NETWORK_SERVER_ADDR_WORKSPACE
: Default address of Workspace Server to connect to (forocrd network client workspace
). -
OCRD_NETWORK_WORKER_QUEUE_CONNECT_ATTEMPTS
: Number of attempts for a worker to create its queue. Helpfull if the rabbitmq-server needs time to be fully started.
Packages
ocrd_utils
Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.
See README for ocrd_utils
for further information.
ocrd_models
Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.
See README for ocrd_models
for further information.
ocrd_modelfactory
Code to instantiate models from existing data.
See README for ocrd_modelfactory
for further information.
ocrd_validators
Schemas and routines for validating BagIt, ocrd-tool.json
, workspaces, METS, page, CLI parameters etc.
See README for ocrd_validators
for further information.
ocrd_network
Components related to OCR-D Web API
See README for ocrd_network
for further information.
ocrd
Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.
Also contains the command line tool ocrd
.
See README for ocrd
for further information.
bash library
Builds a bash script that can be sourced by other bash scripts to create OCRD-compliant CLI.
See README for bashlib
for further information.
Testing
Download assets (make assets
)
Test with local files: make test
- Test with remote assets:
make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'
See Also
- OCR-D Specifications (Repo)
- OCR-D core API documentation (built here via
make docs
) - OCR-D Website (Repo)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ocrd_modelfactory-2.63.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 258f82bc218ba9a13fcc2c09eebb128f489f847b7bfcdf86661487f3febf6a50 |
|
MD5 | 75ea9d9e66ee616962615cc4901abd02 |
|
BLAKE2b-256 | 8ba9093bd0e0a33a8f41992c52d22804256910ed67b231f899978f869f09b11f |