Lacebuilder generates packages for the Lace OCR editing environment.
Project description
Lacebuilder
Lacebuilder is a friendly command-line application that generates packages for the Lace in-browser OCR to TEI web editing application. Point it to an image directory and corresponding hOCR output directory, as well as to a simple xml metadata file, and it produces the .xar packages that can be installed in Lace through eXist-db’s drag-and-drop package manager.
Free software: BSD license
Documentation: https://lacebuilder.readthedocs.io.
Features
Gemerates a base image package for all derived OCR runs, binarizing all images
Generates OCR output packages with the enhanced data used to make editing OCR easy in Lace, including word spellcheck status and dehyphenation
Automatically corrects the word bounding boxes of kraken hOCR output
Examples
lacebuilder offers two subcommands, packimages and packtexts. These have their own parameters. The parameters --outputdir and --metadatafile are common to both of the subcommands, so they are set before them. At present, you cannot chain the subcommands. To access the --help for the subcommands, you must properly set these output parameters, thus:
lacebuilder --outputdir /tmp/ --metadatafile /tmp/myfile_meta.xml packtexts --help
Building an image package:
lacebuilder --outputdir /home/brucerob/ --metadatafile ~/Test_Lacebuilder/552464779_meta.xml packimages --imagedir ~/Test_Tarantella/test outputdir: /home/brucerob/ generating image xar archive Binarizing and compressing images image archive of 111 images saved to /home/brucerob/552464779_images.xar
More information is required to build an hOCR output text package because Lace uses it to store multiple OCR ‘runs’ of a given image set and eventually to search and compile runs that have been completed using the same classifier:
lacebuilder --outputdir /home/brucerob/ --metadatafile ~/Test_Tarantella/552464779_meta.xml packtexts --hocrdir ~/Test_Tarantella/test_hocr_out/ --classifier ~/Downloads/Kraken-Greek-Classifiers-and-Samples/porson_2020-10-10-11-54-25_best.mlmodel --imagexarfile ~/552464779_images.xar dehyphenating spellchecking generating hocr xar accuracy 91%, Greek acc. 91%; completed 00%, Greek completed 00% total: 20669 ; total correct: 11369 writing this data to /tmp/tmpo0_6nin6total.xml text archive from date 2021-01-30-16-05-42 saved to /home/brucerob/552464779-2021-01-30-16-05-42-porson_2020-10-10-11-54-25_best-texts.xar
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2021-01-25)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lacebuilder-0.1.2.tar.gz
.
File metadata
- Download URL: lacebuilder-0.1.2.tar.gz
- Upload date:
- Size: 63.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b95bc7a01733b4d2c6518b543956277a04ae9a8b26e840caddf9c83999328a3a |
|
MD5 | b33bb51650e89f896623829fc21868b5 |
|
BLAKE2b-256 | 6b2b74f13dd8ea3e38250b29aa7ecd11733ca84c1ef86e48e8e8a8cbff73c609 |
File details
Details for the file lacebuilder-0.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: lacebuilder-0.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 63.9 MB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.14.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e60a79e9f22ed6cde283ea172f2d642bb5deb0e795236bb088d43abb21d1fb2 |
|
MD5 | 4e4c45a52e74dc1530bcaeaeff4d83c3 |
|
BLAKE2b-256 | 7d7cb6c4db137cf83794eedd194165e35f7fef589942d96ec633234bc997b55a |