Skip to main content

An extensible viewer for OCR-D workspaces

Project description

OCR-D Browser

An extensible viewer for OCR-D mets.xml files

Unit tests

Screenshot

OCRD Browser with two image and one xml view

Installation on Ubuntu 18.04

sudo make deps-ubuntu
pip install browse-ocrd

Usage

browse-ocrd ./path/to/mets.xml # or open interactively

Features

  • Browse fileGrps and pages, arranging views next to each other for comparison
  • Show original or derived images (AlternativeImage on any level of the structural hierarchy)
  • Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
  • Show raw PAGE-XML with syntax highlighting, open with PageViewer
  • Show concatenated PAGE-XML text annotation
  • Show rendered HTML comparison from dinglehopper evaluations

Configuration

Configuration file locations

At startup the following directories a searched for a config file named ocrd-browser.conf

# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs()  # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir()     # '/home/jk/.config/ocrd-browser.conf'  
os.getcwd()                    # './ocrd-browser.conf'

Configuration file syntax

The ocrd-browser.conf file is an ini-file with the following keys:

[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser 
# Comma seperated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL

# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"  
[Tool PageViewer]
# (ba)sh commandline to execute with placeholders  
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}

The commandline string will be used as a python format string with the keyword arguments:

  • workspace : The current ocrd.Workspace, all properties get shell escaped (by shlex.quote) automatically.
  • file : The current ocrd_models.OcrdFile, all properties get shell escaped (by shlex.quote) automatically, also there is an additional property path with the properties absolute and relative, so {file.path.absolute} will be replaced by the shell quoted absolute path of the file.

Note: You can get PRImA's PageViewer at Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browse-ocrd-0.4.3.tar.gz (67.1 kB view details)

Uploaded Source

Built Distribution

browse_ocrd-0.4.3-py3-none-any.whl (83.8 kB view details)

Uploaded Python 3

File details

Details for the file browse-ocrd-0.4.3.tar.gz.

File metadata

  • Download URL: browse-ocrd-0.4.3.tar.gz
  • Upload date:
  • Size: 67.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for browse-ocrd-0.4.3.tar.gz
Algorithm Hash digest
SHA256 49ed243b2c557d9eb4fd4d84f9b4de18a0c5623aa8277575d4b5b2bec2d60a27
MD5 9ffa667cf2bcd6aa6b63bd5255e18f0e
BLAKE2b-256 9c633717bececcf3bc751cf00e2f8f051cb8cbdb4263744a65e53bfaeec969e7

See more details on using hashes here.

File details

Details for the file browse_ocrd-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: browse_ocrd-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 83.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for browse_ocrd-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 caf51174f93e4a325438ad3c3f36f4fcb2a76bc7f63cdfaf95eb1b158af4d235
MD5 286c0cf7c6aa8ebb85cf8ee6120114ba
BLAKE2b-256 56f47c79b8ca8d15581c885b39dfed6b60da9fc17fa58ddbfd7210886fa14430

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page