Skip to main content

Lacebuilder generates packages for the Lace OCR editing environment.

Project description

Lacebuilder

https://img.shields.io/pypi/v/lacebuilder.svg https://img.shields.io/travis/brobertson/lacebuilder.svg Documentation Status

Lacebuilder is a friendly command-line application that generates packages for the Lace in-browser OCR to TEI web editing application. Point it to an image directory and corresponding hOCR output directory, as well as to a simple xml metadata file, and it produces the .xar packages that can be installed in Lace through eXist-db’s drag-and-drop package manager.

Features

  • Gemerates a base image package for all derived OCR runs, binarizing all images

  • Generates OCR output packages with the enhanced data used to make editing OCR easy in Lace, including word spellcheck status and dehyphenation

  • Automatically corrects the word bounding boxes of kraken hOCR output

Examples

lacebuilder offers two subcommands, packimages and packtexts. These have their own parameters. The parameters --outputdir and --metadatafile are common to both of the subcommands, so they are set before them. At present, you cannot chain the subcommands. To access the --help for the subcommands, you must properly set these output parameters, thus:

lacebuilder --outputdir /tmp/ --metadatafile /tmp/myfile_meta.xml packtexts --help

Building an image package:

lacebuilder --outputdir /home/brucerob/ --metadatafile ~/Test_Lacebuilder/552464779_meta.xml packimages  --imagedir ~/Test_Tarantella/test outputdir: /home/brucerob/
generating image xar archive
Binarizing and compressing images
image archive of 111 images saved to /home/brucerob/552464779_images.xar

More information is required to build an hOCR output text package because Lace uses it to store multiple OCR ‘runs’ of a given image set and eventually to search and compile runs that have been completed using the same classifier:

lacebuilder --outputdir /home/brucerob/ --metadatafile ~/Test_Tarantella/552464779_meta.xml packtexts  --hocrdir ~/Test_Tarantella/test_hocr_out/ --classifier ~/Downloads/Kraken-Greek-Classifiers-and-Samples/porson_2020-10-10-11-54-25_best.mlmodel --imagexarfile ~/552464779_images.xar
dehyphenating
spellchecking
generating hocr xar
accuracy 91%, Greek acc. 91%; completed 00%, Greek completed 00%
total:  20669 ; total correct: 11369
writing this data to  /tmp/tmpo0_6nin6total.xml
text archive from date 2021-01-30-16-05-42 saved to /home/brucerob/552464779-2021-01-30-16-05-42-porson_2020-10-10-11-54-25_best-texts.xar

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2021-01-25)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lacebuilder-0.1.2.tar.gz (63.1 MB view details)

Uploaded Source

Built Distribution

lacebuilder-0.1.2-py2.py3-none-any.whl (63.9 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file lacebuilder-0.1.2.tar.gz.

File metadata

  • Download URL: lacebuilder-0.1.2.tar.gz
  • Upload date:
  • Size: 63.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5

File hashes

Hashes for lacebuilder-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b95bc7a01733b4d2c6518b543956277a04ae9a8b26e840caddf9c83999328a3a
MD5 b33bb51650e89f896623829fc21868b5
BLAKE2b-256 6b2b74f13dd8ea3e38250b29aa7ecd11733ca84c1ef86e48e8e8a8cbff73c609

See more details on using hashes here.

File details

Details for the file lacebuilder-0.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: lacebuilder-0.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 63.9 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5

File hashes

Hashes for lacebuilder-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3e60a79e9f22ed6cde283ea172f2d642bb5deb0e795236bb088d43abb21d1fb2
MD5 4e4c45a52e74dc1530bcaeaeff4d83c3
BLAKE2b-256 7d7cb6c4db137cf83794eedd194165e35f7fef589942d96ec633234bc997b55a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page