TEI Reader
Project description
Python 3 Library for Reading the Text Content and Metadata of TEI P5 (Lite) Files
The library focuses on extracting the main text content from a file and provide the available metadata about the text.
TL;DR
pip install tei-reader
from tei_reader import TeiReader
reader = TeiReader()
corpora = reader.read_file('example-tei.xml') # or read_string
print(corpora.text)
# show element attributes before the actual element text
print(corpora.tostring(lambda x, text: str(list(a.key + '=' + a.text for a in x.attributes)) + text))
More Explanation
A reader can be opened using TeiReader()
. It is then possible to either call read_file(file_name)
or read_string(str)
. Both will return a Corpora
object containing the following properties:
Property | Description |
---|---|
corpora[] |
A corpora can contain sub-corpora. |
documents[] |
The Document objects directly part of this corpora. |
Corpora
and Document
all inherit from Element
. In all objects deriving from this it is possible to call:
Property | Description |
---|---|
attributes{} |
Contain attributes applicable to this element. If an attribute contains attributes these are also returned. (e.g. encodingDesc::editorialDecl::normalization ) |
text |
Get the entire text content as str |
divisions[] |
Recursively get all the text divisions in document order. If an element contains parts or text without tag. Those will be returned in order and wrapped with a PlaceholderDivision . |
parts[] |
Recursively get the parts in document order constituting the entire text e.g. if something has emphasis, a footnote or is marked as foreign. Text without a container element will be returned in order and wrapped with a PlaceholderPart . |
Attribute
, PlaceholderDivision
and PlaceholderPart
all support the same properties as Element
.
Upload to PyPi
python setup.py sdist
twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tei_reader-0.0.17.tar.gz
(7.1 kB
view details)
File details
Details for the file tei_reader-0.0.17.tar.gz
.
File metadata
- Download URL: tei_reader-0.0.17.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7db9809dc405518fda3f33e2024fec4c663c1cb6b27a8fbaa0cd0b5cf7f943ae |
|
MD5 | f322e3e8ba8235d0a51811f4a96ab636 |
|
BLAKE2b-256 | 6316e7127bec394bd1112379ab8ff458f7d0d10f7336e2b7b2396de9931da024 |