Skip to main content

OCR a IIIF images in a manifest and generate annotations

Project description

iiif2annos

Read a manifest, OCR the images, create AnnotationLists and add them to a copy of the manifest

This tool uses the tesseract OCR engine. Ensure you have this installed and on your $PATH before running the code below.

usage: ocr.py [-h] [--base-output-uri OUTPUTURI] [--lang LANG] [-c] manifest output

Read a manifest, OCR all the pages then adds the results as annotation lists

positional arguments:
  manifest              URL to Manifest file
  output                Output directory for annotation lists

options:
  -h, --help            show this help message and exit
  --base-output-uri OUTPUTURI
                        Output URI for annotations and annotation list
  --lang LANG           Language to pass to the OCR engine see: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
  -c, --confidence      Include OCR confidence value in text of the annotation?

This should work with v2 manifests and v3 manifest. For v2 AnnotationLists are created for v3 AnnotationPages are created.

Example

python iiif2annos/ocr.py --lang frk --base-output-uri http://localhost:5500/newspaper https://preview.iiif.io/cookbook/update_newspaper/recipe/0068-newspaper/newspaper_issue_1-manifest.json  newspaper

Using these blogs as a guide:

Testing

Unit tests are in the tests folder and can be run with:

python -m unittest discover -s tests

Run single test:

python -m unittest tests.testImages.TestImage.testCanvasImageMissmatch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iiif2annos-0.0.5.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iiif2annos-0.0.5-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file iiif2annos-0.0.5.tar.gz.

File metadata

  • Download URL: iiif2annos-0.0.5.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iiif2annos-0.0.5.tar.gz
Algorithm Hash digest
SHA256 e59e58ac331fee808efdb5165df6479b3f350ed8b5fd709ea41e5d6b508d168e
MD5 e687a8ec27af46294cfcaebbe4ae0d2e
BLAKE2b-256 3871cb327eca1b43f6ba94ba04180b4dd8b9e82508357e8be7b95d0ebf83c874

See more details on using hashes here.

File details

Details for the file iiif2annos-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: iiif2annos-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iiif2annos-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6c7d3bcf98aed90aed70aeea2a5299eef267d91b60b1f982a1cf6f7750c60c23
MD5 ca8113e92c55dea2514af39a02ba25ee
BLAKE2b-256 dce58a6307ab485639732c6ff5dc910f5dc7ae32c77dda8f7359ea3bca3d7774

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page