Skip to main content

Utilty functions to work with TEI/XML-Documents

Project description

acdh-tei-pyutils

Github Workflow Tests Status PyPI version codecov

Utilty functions to work with TEI Documents

install

run pip install acdh-tei-pyutils

uv add acdh-tei-pyutils

usage

some examples on how to use this package

parse an XML/TEI Document from and URL, string or file

from acdh_tei_pyutils.tei import TeiReader

doc = TeiReader("https://raw.githubusercontent.com/acdh-oeaw/acdh-tei-pyutils/main/acdh_tei_pyutils/files/tei.xml")
print(doc.tree)
>>> <Element {http://www.tei-c.org/ns/1.0}TEI at 0x7ffb926f9c40>

doc = TeiReader("./acdh_tei_pyutils/files/tei.xml")
doc.tree
>>> <Element {http://www.tei-c.org/ns/1.0}TEI at 0x7ffb926f9c40>

write the current XML/TEI tree object to file

doc.tree_to_file("out.xml")
>>> 'out.xml'

see acdh_tei_pyutils/cli.py for further examples

command line scripts

Batch process a collection of XML/Documents by adding xml:id, xml:base next and prev attributes to the documents root element run:

# using uv
uv run add-attributes -g "/path/to/your/xmls/*.xml" -b "https://value/of-your/base.com"

# pip installed
add-attributes -g "/path/to/your/xmls/*.xml" -b "https://value/of-your/base.com"
add-attributes -g "../../xml/grundbuecher/gb-data/data/editions/*.xml" -b "https://id.acdh.oeaw.ac.at/grundbuecher"

Write mentions as listEvents into index-files:

mentions-to-indices -t "erwähnt in " -i "/path/to/your/xmls/indices/*.xml" -f "/path/to/your/xmls/editions/*.xml"

Write mentions as listEvents of index-files and copy enriched index entries into files

# docs
uv run denormalize-indices --help 

# examples
uv run denormalize-indices -f "../../xml/schnitzler/schnitzler-tagebuch-data-public/editions/*.xml" -i "../../xml/schnitzler/schnitzler-tagebuch-data-public/indices/*.xml"
uv run denormalize-indices -f "./data/*/*.xml" -i "./data/indices/*.xml" -m ".//*[@key]/@key" -x ".//tei:title[@level='a']/text()"
uv run denormalize-indices -f "./data/*/*.xml" -i "./data/indices/*.xml" -m ".//*[@key]/@key" -x ".//tei:title[@level='a']/text()" -b pmb2121 -b pmb10815 -b pmb50
uv run denormalize-indices -f "./data/*/*.xml" -i "./data/indices/*.xml" --standoff # writes entity-lists into a tei:standOff element and not in a back element. 

develop

  • project uses uv
  • linting/formatting uv run ruff check . uv run ruff format .
  • before commiting run flake8 to check linting and uv run coverage run -m pytest -v to run the tests

bump version

uv version --bump minor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acdh_tei_pyutils-2.3.1.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acdh_tei_pyutils-2.3.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file acdh_tei_pyutils-2.3.1.tar.gz.

File metadata

  • Download URL: acdh_tei_pyutils-2.3.1.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for acdh_tei_pyutils-2.3.1.tar.gz
Algorithm Hash digest
SHA256 37d81089cf69335c99019d85874dd59b0c31c928a360c9b5819c1c61d63fb709
MD5 35afa2464cc0f6f1b4b044e640a90da2
BLAKE2b-256 e1f3b3f840baf9f3dd1f86aa5b367d8bb0e8ecad1955a3d05cde6a20f3b20925

See more details on using hashes here.

File details

Details for the file acdh_tei_pyutils-2.3.1-py3-none-any.whl.

File metadata

  • Download URL: acdh_tei_pyutils-2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for acdh_tei_pyutils-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae1a42a01794bf8234f771857073d606aea0be5d7b8510c90e57e6a79ba63b6b
MD5 dc7c3dcc05d0ded25832215588443e0b
BLAKE2b-256 a63106d2ced9f6109d6fdef5a64d6d3d5f93f7095513983896d4bde6ce37be06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page