Transform TEI XML to a simple standoff format
Project description
Flatten Tei
Reformat tei-xml files to raw text + standoff annotations in json (flatdoc)
flatdocis not a standardized formatflatdocis a json file containing the whole text of a document in thetextfield- All span annotations are in 'annotations' in form of an object.
- e.g.
{"Sentence": [{'begin':0, 'end': 13}, ...], ..}
Access content of flatdoc files
Use Case: Get all Sentences of a document in flatdoc-format
- Assuming there are Sentence annotation.
from flattentei import get_units
fn = <filename of flatdoc json file>
with open(fn) as f:
flatdoc = json.load(f)
sentences = get_units("Sentence", flatdoc)
Use Case: Get all Entities of a document in flatdoc-format
- Assuming the entities are stored as
Entityin theannotationsfield - (In the GSAP project
ScholarlyEntitiy) - enrich each entity with
Sentence-texts- They can be found in the
containerfield for each entity
- They can be found in the
from flattentei import get_units
fn = <filename of flatdoc json file>
with open(fn) as f:
flatdoc = json.load(f)
entities = get_units("Entity", flatdoc, enrich_container="Sentence")
for ent in entities:
print(f'The entity span: {ent["text"]}')
sentence_text = ent['containers']['Sentence']['text']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
flattentei-0.1.8.tar.gz
(95.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flattentei-0.1.8.tar.gz.
File metadata
- Download URL: flattentei-0.1.8.tar.gz
- Upload date:
- Size: 95.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db44a8e275daef1af1c5fb8ea18c10c987faadc943b9d2f9bcf23adf70d69621
|
|
| MD5 |
d25c3c7d68ebfde090855dacf466239d
|
|
| BLAKE2b-256 |
e93206c8a9fe877c31720a2fdc9b8d55270e3480971526bd778eb141775b4645
|
File details
Details for the file flattentei-0.1.8-py3-none-any.whl.
File metadata
- Download URL: flattentei-0.1.8-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fcf031cc9f052b0bbdbbad52b3706b2236a76197a3083acf80b2352bd99b8f6
|
|
| MD5 |
c78bbd0c17cec46f305f155f15178015
|
|
| BLAKE2b-256 |
b235fb6ac04d40b3cd855fa1e88780b156e6aed0eaadc27f1d9ffd9ab7c9cf2a
|