Skip to main content

OCR a IIIF images in a manifest and generate annotations

Project description

iiif2annos

Read a manifest, OCR the images, create AnnotationLists and add them to a copy of the manifest

This tool uses the tesseract OCR engine. Ensure you have this installed and on your $PATH before running the code below.

usage: ocr.py [-h] [--base-output-uri OUTPUTURI] [--lang LANG] [-c] manifest output

Read a manifest, OCR all the pages then adds the results as annotation lists

positional arguments:
  manifest              URL to Manifest file
  output                Output directory for annotation lists

options:
  -h, --help            show this help message and exit
  --base-output-uri OUTPUTURI
                        Output URI for annotations and annotation list
  --lang LANG           Language to pass to the OCR engine see: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
  -c, --confidence      Include OCR confidence value in text of the annotation?

This should work with v2 manifests and v3 manifest. For v2 AnnotationLists are created for v3 AnnotationPages are created.

Example

python iiif2annos/ocr.py --lang frk --base-output-uri http://localhost:5500/newspaper https://preview.iiif.io/cookbook/update_newspaper/recipe/0068-newspaper/newspaper_issue_1-manifest.json  newspaper

Using these blogs as a guide:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iiif2annos-0.0.3.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iiif2annos-0.0.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file iiif2annos-0.0.3.tar.gz.

File metadata

  • Download URL: iiif2annos-0.0.3.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for iiif2annos-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e6bbac9229144756603ab34cfce7fc52d7ae08cbcfb388f5ea7c4d0db71e06f7
MD5 5971e295e0563793ab170d1b1b4d7442
BLAKE2b-256 e6a81d39d47d7e3c644a61a6874493bd1742f46329ff1b86b6ef9422f08c6f33

See more details on using hashes here.

File details

Details for the file iiif2annos-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: iiif2annos-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for iiif2annos-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 14f987ed408fad26e24cf7fafe31e2e7073921a164e79df163c7f7a7a27fcc49
MD5 8aab2763a35d5947a06259cb5efd4abc
BLAKE2b-256 0a701ff72549ba8ae910a40fabfa15da8f71b649574c2a60020e16b49def776e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page