Skip to main content

TEI Reader

Project description

Build Status

Python 3 Library for Reading the Text Content and Metadata of TEI P5 (Lite) Files

The library focuses on extracting the main text content from a file and provide the available metadata about the text.

TL;DR

pip install tei-reader
from tei_reader import TeiReader
reader = TeiReader()
corpora = reader.read_file('example-tei.xml') # or read_string
print(corpora.text)

# show element attributes before the actual element text
print(corpora.tostring(lambda x, text: str(list(a.key + '=' + a.text for a in x.attributes)) + text))

More Explanation

A reader can be opened using TeiReader(). It is then possible to either call read_file(file_name) or read_string(str). Both will return a Corpora object containing the following properties:

Property Description
corpora[] A corpora can contain sub-corpora.
documents[] The Document objects directly part of this corpora.

Corpora and Document all inherit from Element. In all objects deriving from this it is possible to call:

Property Description
attributes{} Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. encodingDesc::editorialDecl::normalization)
text Get the entire text content as str
divisions[] Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a PlaceholderDivision.
parts[] Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a PlaceholderPart.

Attribute, PlaceholderDivision and PlaceholderPart all support the same properties as Element.

Upload to PyPi

python setup.py sdist
twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tei_reader-0.0.17.tar.gz (7.1 kB view details)

Uploaded Source

File details

Details for the file tei_reader-0.0.17.tar.gz.

File metadata

  • Download URL: tei_reader-0.0.17.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3

File hashes

Hashes for tei_reader-0.0.17.tar.gz
Algorithm Hash digest
SHA256 7db9809dc405518fda3f33e2024fec4c663c1cb6b27a8fbaa0cd0b5cf7f943ae
MD5 f322e3e8ba8235d0a51811f4a96ab636
BLAKE2b-256 6316e7127bec394bd1112379ab8ff458f7d0d10f7336e2b7b2396de9931da024

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page