Skip to main content

An extensible viewer for OCR-D workspaces

Project description

OCR-D Browser

An extensible viewer for OCR-D mets.xml files

Unit tests

Screenshot

OCRD Browser with Page and Xml view

Features

  • Browse fileGrps and pages, arranging views next to each other for comparison
  • PageView: Show original or derived page images with PAGE-XML annotations overlay, similar to PageViewer
  • ImageView: Show original or derived images (AlternativeImage on any level of the structural hierarchy)
  • ImageView: Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
  • XmlView: Show raw PAGE-XML with syntax highlighting, open with PageViewer
  • TextView: Show concatenated PAGE-XML text annotation
  • DiffView: Show a simple diff comparison between text annotations from different fileGrps
  • HtmlView: Show rendered HTML comparison from dinglehopper evaluations

Installation (tested on Ubuntu 18.04/20.04)

In any case you need a venv with a current pip version (>=20), preferably your existing ocrd-venv:

Create a current pip venv:
sudo apt install python3-pip python3-venv 
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel

From source

git clone https://github.com/hnesk/browse-ocrd.git 
cd browse-ocrd
sudo make deps-ubuntu
make install

Via pip

sudo apt install libcairo2-dev libgirepository1.0-dev
pip install browse-ocrd

Usage

browse-ocrd ./path/to/mets.xml # or open interactively

Configuration

Configuration file locations

At startup the following directories a searched for a config file named ocrd-browser.conf

# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs()  # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir()     # '/home/jk/.config/ocrd-browser.conf'  
os.getcwd()                    # './ocrd-browser.conf'

Configuration file syntax

The ocrd-browser.conf file is an ini-file with the following keys:

[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser 
# Comma seperated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL

# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"  
[Tool PageViewer]
# (ba)sh commandline to execute with placeholders  
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}

The commandline string will be used as a python format string with the keyword arguments:

  • workspace : The current ocrd.Workspace, all properties get shell escaped (by shlex.quote) automatically.
  • file : The current ocrd_models.OcrdFile, all properties get shell escaped (by shlex.quote) automatically, also there is an additional property path with the properties absolute and relative, so {file.path.absolute} will be replaced by the shell quoted absolute path of the file.

Note: You can get PRImA's PageViewer at Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browse-ocrd-0.5.1.tar.gz (85.6 kB view details)

Uploaded Source

Built Distribution

browse_ocrd-0.5.1-py3-none-any.whl (107.9 kB view details)

Uploaded Python 3

File details

Details for the file browse-ocrd-0.5.1.tar.gz.

File metadata

  • Download URL: browse-ocrd-0.5.1.tar.gz
  • Upload date:
  • Size: 85.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.26.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for browse-ocrd-0.5.1.tar.gz
Algorithm Hash digest
SHA256 ebfb99b437ec251ad293063d0993f3e10dc4962376263edaece9b3db774a0206
MD5 5e8280a9a6b3eb2d0bae725a1618cd7f
BLAKE2b-256 fbc430792ea6f76e3698f4a96ad6de3cda8dc29db9a018b8ed434a3f0a1d46f9

See more details on using hashes here.

File details

Details for the file browse_ocrd-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: browse_ocrd-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 107.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.26.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for browse_ocrd-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4939bb7e03f07fd5aa201b02a48ba2a04ee2e9d837d93894161f2ece0939f666
MD5 4625cea96064723a89421366188a9c2f
BLAKE2b-256 e9ddbddc5717ad6297c26a95f2618931459bce99010b042fe22c582a9794ff5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page