Skip to main content

A Python CLI app that pulls data from TEI-XML files and transforms them to conformant IIIF Annotation manifests.

Project description

TEI-IIIF Converter

Introduction

TEI-IIIF turns angle brackets into curly brackets.

To put it in more descriptive terms, it is a Python CLI application that pulls data from TEI XML and transforms to conformant IIIF Annotation manifests, as described in the IIIF Presentation API 3.0.

TEI-IIIF generates a .json manifest for each <div> in a given XML file. Within each <div> it targets <p> elements with a facs attribute, which are used as target values in the output manifests. It uses lxml's etree.tostring method to pull the children of targeted <p> elements, saving them as the value for that given target.

Installation

pip install TEI-IIIF

Basic use

  • Settings can be found in settings.yaml. If you have installed tei_iiif in a virtual environment using venv, you can find settings.yaml in your_directory/venv/lib/python[version]/site-packages/tei-iiif.
  • Specify the base_url where you are hosting your XML. This can be either a URI (e.g. https://foo.bar/baz/transcriptions/) or a local source (e.g. ./projects/foo-bar/transcriptions/). In both cases, remember the trailing /. Once this has been set, TEI-IIIF can be run from the command line, with the file you wish to convert passed as an argument:

python3 tei_iiif -m transcription.xml

Considerations

  • Because TEI-IIIF uses etree.tostring to produce the text for annotations, it captures both tags and text and replicates them. Depending on the input XML, the manifests it outputs may need to be sanitised in order to be used in production, or you may need to sanitise or simplify the XML prior to processing. As use cases can differ dramatically from project to project, TEI-IIIF does not attempt to sanitise output body text.
  • However, TEI-IIIF does include regex to sanitise facs attributes such that they use #xywh= formatting for image selectors. This can be changed according to your use case in settings.py
  • By default TEI-IIIF assumes you have XML structured in a format roughly equivalent to the following:
<div n="1">
	<p facs=“https://facsimile-server.com/iiif/foo-bar/p1/571,152,1951,1076”>
		<children>...</children>
	</p>
	<p facs="https://facsimile-server.com/iiif/foo-bar/p1/675,728,1949,1320”>
		<children>...</children>
	</p>
</div>
<div n="2">
	<p facs="https://facsimile-server.com/iiif/foo-bar/p2/571,152,1951,1076">
		<children>...</children>
	</p>
	<p facs="https://facsimile-server.com/iiif/foo-bar/p2/675,728,1949,1320">
		<children>...</children>
	</p>
</div>
  • If your XML differs dramatically from the below then you can change the XPath in xmlparser.py and divjson.py.
  • TEI-IIIF defaults to the base TEI namespace URI. This can be changed in settings.yaml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TEI-IIIF-0.9.2.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

TEI_IIIF-0.9.2-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file TEI-IIIF-0.9.2.tar.gz.

File metadata

  • Download URL: TEI-IIIF-0.9.2.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for TEI-IIIF-0.9.2.tar.gz
Algorithm Hash digest
SHA256 ffcc6eb31ee4611161ce38bb035a166c4f48b0003220885337434def0cc450df
MD5 6cda35d2c75b612d6a5311f00a3df65c
BLAKE2b-256 326f278f64ef1f964105a08c728e60e25e3d9223365aa55005c32c368ae7c551

See more details on using hashes here.

File details

Details for the file TEI_IIIF-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: TEI_IIIF-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for TEI_IIIF-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c3f099663a3a5cb445c20b4f0597bf0508b64e99c9199e567f1a5ed2acdbb8
MD5 1d94b7e13bb31f00ab5df12f77788320
BLAKE2b-256 c8f629684dd1ff84ebb12b838221927f58239295be4fcc50af687d02df5afe8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page