An extensible viewer for OCR-D workspaces
Project description
OCR-D Browser
An extensible viewer for OCR-D mets.xml files
Screenshot
Features
- Browse fileGrps and pages, arranging views next to each other for comparison
- PageView: Show original or derived page images with PAGE-XML annotations overlay, similar to PageViewer
- ImageView: Show original or derived images (
AlternativeImage
on any level of the structural hierarchy) - ImageView: Show multiple images at once for different pages (horizontally) or different segments (vertically), zooming freely
- XmlView: Show raw PAGE-XML with syntax highlighting, open with PageViewer
- TextView: Show concatenated PAGE-XML text annotation
- DiffView: Show a simple diff comparison between text annotations from different fileGrps
- HtmlView: Show rendered HTML comparison from dinglehopper evaluations
Installation
OCR-D Browser requires Python 3.7 or higher.
Native (tested on Ubuntu 18.04/20.04)
The native installation requires GTK 3.
In any case you need a virtual environment with a current pip
version (>=20), preferably your existing OCR-D venv:
Create a current pip venv:
sudo apt install python3-pip python3-venv
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
From source
git clone https://github.com/hnesk/browse-ocrd.git
cd browse-ocrd
sudo make deps-ubuntu
make install
Via pip
sudo apt install libcairo2-dev libgirepository1.0-dev
pip install browse-ocrd
Docker
If you have installed Docker, you can build OCR-D Browser as a web service:
docker build -t ocrd_browser .
Or use a prebuilt image from Dockerhub:
docker pull hnesk/ocrd_browser
Usage
Native GUI
Start the app with the filesystem path to the METS file of your OCR-D workspace:
browse-ocrd ./path/to/mets.xml
You can still open another METS file from the UI though.
Docker service
When running the webservice, you need to pass a directory DATADIR
which (recursively) contains all the workspaces you want to serve.
The top entrypoint http://localhost/
will show an index page with a link http://localhost/browse/...
for each workspace path.
Each link will run browse-ocrd
at that workspace in the background, and then redirect your browser to the internal Broadway server, which renders the app in the web browser.
To start up, just do:
docker run -it --rm -v DATADIR:/data -p 8085:8085 -p 8080:8080 ocrd_browser
Configuration
Configuration file locations
At startup the following directories a searched for a config file named ocrd-browser.conf
# directories and their default values under Ubuntu 20.04
GLib.get_system_config_dirs() # '/etc/xdg/xdg-ubuntu/ocrd-browser.conf', '/etc/xdg/ocrd-browser.conf'
GLib.get_user_config_dir() # '/home/jk/.config/ocrd-browser.conf'
os.getcwd() # './ocrd-browser.conf'
Configuration file syntax
The ocrd-browser.conf
file is an ini-file with the following keys:
[FileGroups]
# Preferred fileGrp names for thumbnail display in the Page Browser
# Comma separated list of regular expressions
preferredImages = OCR-D-IMG, OCR-D-IMG.*, ORIGINAL
# Each Tool has a section header [Tool XYZ]
# At the moment the only defined tool is "PageViewer"
[Tool PageViewer]
# shell commandline to execute with placeholders
commandline = /usr/bin/java -jar /home/jk/bin/JPageViewer/JPageViewer.jar --resolve-dir {workspace.directory} {file.path.absolute}
The commandline
string will be used as a python format string with the keyword arguments:
workspace
: The currentocrd.Workspace
, all properties get shell escaped (byshlex.quote
) automatically.file
: The currentocrd_models.OcrdFile
, all properties get shell escaped (byshlex.quote
) automatically, also there is an additional propertypath
with the propertiesabsolute
andrelative
, so{file.path.absolute}
will be replaced by the shell quoted absolute path of the file.
Note: You can get PRImA's PageViewer at Github.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for browse_ocrd-0.5.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d8e601de1e8f246b05fbd0912f24d05ed43985d116b7fe5114715b0fd77a909 |
|
MD5 | 2d771ace5e48be13487de9bdfecc72b6 |
|
BLAKE2b-256 | 966f48af184b4bfb9990fe076207bec6cf286374c9172a3d9c7287e15812cbc6 |