OCR a IIIF images in a manifest and generate annotations
Project description
iiif2annos
Read a manifest, OCR the images, create AnnotationLists and add them to a copy of the manifest
This tool uses the tesseract OCR engine. Ensure you have this installed and on your $PATH before running the code below.
usage: ocr.py [-h] [--base-output-uri OUTPUTURI] [--lang LANG] [-c] manifest output
Read a manifest, OCR all the pages then adds the results as annotation lists
positional arguments:
manifest URL to Manifest file
output Output directory for annotation lists
options:
-h, --help show this help message and exit
--base-output-uri OUTPUTURI
Output URI for annotations and annotation list
--lang LANG Language to pass to the OCR engine see: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
-c, --confidence Include OCR confidence value in text of the annotation?
This should work with v2 manifests and v3 manifest. For v2 AnnotationLists are created for v3 AnnotationPages are created.
Example
python iiif2annos/ocr.py --lang frk --base-output-uri http://localhost:5500/newspaper https://preview.iiif.io/cookbook/update_newspaper/recipe/0068-newspaper/newspaper_issue_1-manifest.json newspaper
Using these blogs as a guide:
- https://nanonets.com/blog/ocr-with-tesseract/#ocr-with-pytesseract-and-opencv
- https://pypi.org/project/pytesseract/
Testing
Unit tests are in the tests folder and can be run with:
python -m unittest discover -s tests
Run single test:
python -m unittest tests.testImages.TestImage.testCanvasImageMissmatch
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iiif2annos-0.0.4.tar.gz.
File metadata
- Download URL: iiif2annos-0.0.4.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d14ad4ee533166286ff89eadbc1377a055057c76f4203064e0a0ca78e758963
|
|
| MD5 |
9c4cac922963981895dc8d4a44e41329
|
|
| BLAKE2b-256 |
71362ece17fa74ea05573aef0cee612223c31e9c82909878435afe810bae9fd0
|
File details
Details for the file iiif2annos-0.0.4-py3-none-any.whl.
File metadata
- Download URL: iiif2annos-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0742be5e0af681cca2844b918fda7f812775677a051c0bbe17149cb81d81e79
|
|
| MD5 |
8cf0ec80d4616d5cfc5cff0e590e34c0
|
|
| BLAKE2b-256 |
ad2a863059c28779cd0fe43e2f30a8d4a14eeb57da67fc03439f53b8d1901e47
|