Skip to main content

OCR a IIIF images in a manifest and generate annotations

Project description

iiif2annos

Read a manifest, OCR the images, create AnnotationLists and add them to a copy of the manifest

This tool uses the tesseract OCR engine. Ensure you have this installed and on your $PATH before running the code below.

usage: ocr.py [-h] [--base-output-uri OUTPUTURI] [--lang LANG] [-c] manifest output

Read a manifest, OCR all the pages then adds the results as annotation lists

positional arguments:
  manifest              URL to Manifest file
  output                Output directory for annotation lists

options:
  -h, --help            show this help message and exit
  --base-output-uri OUTPUTURI
                        Output URI for annotations and annotation list
  --lang LANG           Language to pass to the OCR engine see: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
  -c, --confidence      Include OCR confidence value in text of the annotation?

This should work with v2 manifests and v3 manifest. For v2 AnnotationLists are created for v3 AnnotationPages are created.

Example

python iiif2annos/ocr.py --lang frk --base-output-uri http://localhost:5500/newspaper https://preview.iiif.io/cookbook/update_newspaper/recipe/0068-newspaper/newspaper_issue_1-manifest.json  newspaper

Using these blogs as a guide:

Testing

Unit tests are in the tests folder and can be run with:

python -m unittest discover -s tests

Run single test:

python -m unittest tests.testImages.TestImage.testCanvasImageMissmatch

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iiif2annos-0.0.4.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iiif2annos-0.0.4-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file iiif2annos-0.0.4.tar.gz.

File metadata

  • Download URL: iiif2annos-0.0.4.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iiif2annos-0.0.4.tar.gz
Algorithm Hash digest
SHA256 2d14ad4ee533166286ff89eadbc1377a055057c76f4203064e0a0ca78e758963
MD5 9c4cac922963981895dc8d4a44e41329
BLAKE2b-256 71362ece17fa74ea05573aef0cee612223c31e9c82909878435afe810bae9fd0

See more details on using hashes here.

File details

Details for the file iiif2annos-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: iiif2annos-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iiif2annos-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f0742be5e0af681cca2844b918fda7f812775677a051c0bbe17149cb81d81e79
MD5 8cf0ec80d4616d5cfc5cff0e590e34c0
BLAKE2b-256 ad2a863059c28779cd0fe43e2f30a8d4a14eeb57da67fc03439f53b8d1901e47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page