A Python CLI app that pulls data from TEI-XML files and transforms them to conformant IIIF Annotation manifests.
Project description
TEI-IIIF Converter
Introduction
TEI-IIIF turns angle brackets into curly brackets.
To put it in more descriptive terms, it is a Python CLI application that pulls data from TEI XML and transforms to conformant IIIF Annotation manifests, as described in the IIIF Presentation API 3.0.
TEI-IIIF generates a .json manifest for each <div> in a given XML file. Within each <div> it targets <p> elements with a facs attribute, which are used as target values in the output manifests. It uses lxml's etree.tostring method to pull the children of targeted <p> elements, saving them as the value for that given target.
Installation
pip install TEI-IIIF
Basic use
- Settings can be found in
settings.yaml. If you have installedtei_iiifin a virtual environment usingvenv, you can findsettings.yamlinyour_directory/venv/lib/python[version]/site-packages/tei-iiif. - Specify the
base_urlwhere you are hosting your XML. This can be either a URI (e.g.https://foo.bar/baz/transcriptions/) or a local source (e.g../projects/foo-bar/transcriptions/). In both cases, remember the trailing/. Once this has been set, TEI-IIIF can be run from the command line, with the file you wish to convert passed as an argument:
python3 tei_iiif -m transcription.xml
Considerations
- Because TEI-IIIF uses
etree.tostringto produce the text for annotations, it captures both tags and text and replicates them. Depending on the input XML, the manifests it outputs may need to be sanitised in order to be used in production, or you may need to sanitise or simplify the XML prior to processing. As use cases can differ dramatically from project to project, TEI-IIIF does not attempt to sanitise output body text. - However, TEI-IIIF does include regex to sanitise
facsattributes such that they use#xywh=formatting for image selectors. This can be changed according to your use case insettings.py - By default TEI-IIIF assumes you have XML structured in a format roughly equivalent to the following:
<div n="1">
<p facs=“https://facsimile-server.com/iiif/foo-bar/p1/571,152,1951,1076”>
<children>...</children>
</p>
<p facs="https://facsimile-server.com/iiif/foo-bar/p1/675,728,1949,1320”>
<children>...</children>
</p>
</div>
<div n="2">
<p facs="https://facsimile-server.com/iiif/foo-bar/p2/571,152,1951,1076">
<children>...</children>
</p>
<p facs="https://facsimile-server.com/iiif/foo-bar/p2/675,728,1949,1320">
<children>...</children>
</p>
</div>
- If your XML differs dramatically from the below then you can change the XPath in
xmlparser.pyanddivjson.py. - TEI-IIIF defaults to the base TEI namespace URI. This can be changed in
settings.yaml.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file TEI-IIIF-0.9.2.tar.gz.
File metadata
- Download URL: TEI-IIIF-0.9.2.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffcc6eb31ee4611161ce38bb035a166c4f48b0003220885337434def0cc450df
|
|
| MD5 |
6cda35d2c75b612d6a5311f00a3df65c
|
|
| BLAKE2b-256 |
326f278f64ef1f964105a08c728e60e25e3d9223365aa55005c32c368ae7c551
|
File details
Details for the file TEI_IIIF-0.9.2-py3-none-any.whl.
File metadata
- Download URL: TEI_IIIF-0.9.2-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5c3f099663a3a5cb445c20b4f0597bf0508b64e99c9199e567f1a5ed2acdbb8
|
|
| MD5 |
1d94b7e13bb31f00ab5df12f77788320
|
|
| BLAKE2b-256 |
c8f629684dd1ff84ebb12b838221927f58239295be4fcc50af687d02df5afe8e
|