A Python utility for extracting RDFa data from TEI-XML documents.
Project description
[!IMPORTANT] tei-rdfa is currently in beta and welcomes feedback from early adopters.
tei-rdfa
A Python utility for extracting RDFa data from TEI-XML documents.
Overview
tei_rdfa() is a dedicated function that extracts RDFa (Resource Description Framework in Attributes) data embedded in TEI (Text Encoding Initiative) XML documents and converts it into a standard RDF graph.
The function handles native TEI namespace formatting through <prefixDef> elements (//tei:encodingDesc/tei:listPrefixDef/tei:prefixDef) rather than through the HTML5-style prefix or XHTML/XML-style xmlns:prefix attributes.[^1]
Features
- Loads TEI-XML files from local paths or URLs
- Processes RDFa attributes
about,typeof,property,rel,rev,resource,content - Extracts and resolves namespace prefixes from TEI-specific
<prefixDef>elements - Generates RDF triples from embedded RDFa information
- Provides targeted extraction via XPath expressions
- Returns an RDFlib Graph object for further processing or serialization
- Implements robust error handling and informative error messages
- Offers verbose mode with detailed logging
Parameters
xmlfile(str): File path or URL to a TEI-XML file (must have.xmlor.teiextension)xpath_expr(str, optional): XPath expression to target specific elements for RDFa extraction; will otherwise target the XML root elementverbose(bool, default=True): Controls logging output
Dependencies
Implementation Details
The package includes several helper functions that handle specific aspects of RDFa extraction. It implements defensive programming practices with input validation and comprehensive error handling for common issues:
- Invalid file extensions
- Invalid XML syntax
- Invalid URLs or file paths
- Invalid XPath syntax
- Erroneous XPath queries
Error messages provide contextual information to facilitate debugging and resolution.
Installation
pip install tei-rdfa
Example Usage
from tei_rdfa import tei_rdfa
# Basic usage
graph = tei_rdfa('path/to/document.xml')
# With XPath to target specific elements
graph = tei_rdfa(
xmlfile='https://example.org/document.tei',
xpath_expr='//tei:person[2]',
verbose=True
)
# Process resulting graph
print(graph.serialize(format='turtle'))
Directory Structure
tei_rdfa/
├── LICENSE
├── README.md
├── pyproject.toml
└── tei_rdfa/
├── __init__.py
├── requirements.txt
└── ipynb/
└── tei_rdfa.ipynb
The repository is organized as follows:
- tei_rdfa/ contains project metadata and configuration
- tei_rdfa/tei_rdfa/ contains the package implementation
- tei_rdfa/tei_rdfa/ipynb/ contains a Jupyter notebook demonstrating usage examples and error scenarios
[^1]: See https://github.com/TEIC/TEI/issues/1860.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tei_rdfa-0.1.0b4.tar.gz.
File metadata
- Download URL: tei_rdfa-0.1.0b4.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cd008561250116cb8cc35d6c43e48562827c45ff2c4dcd93e621f261bd85bdf
|
|
| MD5 |
3eccf4e915c454dab74735907ec4e54e
|
|
| BLAKE2b-256 |
5b3b18030f7c80214f11e426fd2a894520516153fdaee029db711fbaa9d7c172
|
File details
Details for the file tei_rdfa-0.1.0b4-py3-none-any.whl.
File metadata
- Download URL: tei_rdfa-0.1.0b4-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87fecbd19f53aa1ffa1f89d34e2144801dd5fc92d1eb869fd849ae8ae47ed5d9
|
|
| MD5 |
02176be8d55e34c9c853f32bbc0a1e2f
|
|
| BLAKE2b-256 |
5d217f1c3394f58419363c3d15865e477d0038381746a37824cf0e4e5d3df0ff
|