TEI Reader
Project description
Python 3 Library for Reading the Text Content and Metadata of TEI P5 (Lite) Files
The library focuses on extracting the main text content from a file and provide the available metadata about the text.
TL;DR
pip install tei-reader
from tei_reader import TeiReader
reader = TeiReader()
corpora = reader.read_file('example-tei.xml') # or read_string
print(corpora.text)
# show element attributes before the actual element text
print(corpora.tostring(lambda x, text: str(list(a.key + '=' + a.text for a in x.attributes)) + text))
More Explanation
A reader can be opened using TeiReader(). It is then possible to either call read_file(file_name) or read_string(str). Both will return a Corpora object containing the following properties:
| Property | Description |
|---|---|
corpora[] |
A corpora can contain sub-corpora. |
documents[] |
The Document objects directly part of this corpora. |
Corpora and Document all inherit from Element. In all objects deriving from this it is possible to call:
| Property | Description |
|---|---|
attributes{} |
Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. encodingDesc::editorialDecl::normalization) |
text |
Get the entire text content as str |
divisions[] |
Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a PlaceholderDivision. |
parts[] |
Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a PlaceholderPart. |
Attribute, PlaceholderDivision and PlaceholderPart all support the same properties as Element.
Upload to PyPi
python setup.py sdist
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tei_reader-0.0.17.tar.gz
(7.1 kB
view details)
File details
Details for the file tei_reader-0.0.17.tar.gz.
File metadata
- Download URL: tei_reader-0.0.17.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7db9809dc405518fda3f33e2024fec4c663c1cb6b27a8fbaa0cd0b5cf7f943ae
|
|
| MD5 |
f322e3e8ba8235d0a51811f4a96ab636
|
|
| BLAKE2b-256 |
6316e7127bec394bd1112379ab8ff458f7d0d10f7336e2b7b2396de9931da024
|