Skip to main content

A Python utility for extracting RDFa data from TEI-XML documents.

Project description

tei_rdfa

A Python utility for extracting RDFa data from TEI-XML documents.

Overview

tei_rdfa() is a dedicated function that extracts Resource Description Framework in Attributes (RDFa) data embedded in TEI (Text Encoding Initiative) XML documents and converts it into a standard RDF graph. The function handles native TEI namespace formatting through <prefixDef> elements (inside the <encodingDesc> section of the <teiHeader>).

Features

  • Loads TEI-XML files from local paths or URLs
  • Processes RDFa attributes about, typeof, property, rel, rev, resource, content
  • Extracts and resolves namespace prefixes from TEI-specific <prefixDef> elements
  • Generates RDF triples from embedded RDFa information
  • Provides targeted extraction via XPath expressions
  • Returns an RDFlib Graph object for further processing or serialization
  • Implements robust error handling and informative error messages
  • Offers verbose mode with detailed logging

Parameters

  • xmlfile (str): File path or URL to a TEI-XML file (must have .xml or .tei extension)
  • xpath_expr (str, optional): XPath expression to target specific elements for RDFa extraction; will otherwise target the XML root element
  • verbose (bool, default=True): Controls logging output and graph serialization display

Dependencies

  • RDFlib: Core RDF functionality
  • lxml: XML processing and XPath support

Implementation Details

The package includes several helper functions that handle specific aspects of RDFa extraction. It implements defensive programming practices with input validation and comprehensive error handling for common issues:

  • Invalid file extensions
  • Invalid XML syntax
  • Invalid URLs or file paths
  • Invalid XPath syntax
  • Erroneous XPath queries

Error messages provide contextual information to facilitate debugging and resolution.

Example Usage

from tei_rdfa import tei_rdfa

# Basic usage
graph = tei_rdfa('path/to/document.xml')

# With XPath to target specific elements
graph = tei_rdfa(
    xmlfile='https://example.org/document.tei',
    xpath_expr='//tei:person[2]',
    verbose=True
)

# Process resulting graph
print(graph.serialize(format='turtle'))

Directory Structure

tei_rdfa/
├── LICENSE
├── README.md
├── pyproject.toml
└── tei_rdfa/
    ├── __init__.py
    ├── requirements.txt
    └── ipynb/
        └── tei_rdfa.ipynb

The repository is organized as follows:

  • tei_rdfa/ contains project metadata and configuration
  • tei_rdfa/tei_rdfa/ contains the package implementation
  • tei_rdfa/tei_rdfa/ipynb/ contains a Jupyter notebook demonstrating usage examples and error scenarios

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tei_rdfa-0.1.0b3.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tei_rdfa-0.1.0b3-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file tei_rdfa-0.1.0b3.tar.gz.

File metadata

  • Download URL: tei_rdfa-0.1.0b3.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tei_rdfa-0.1.0b3.tar.gz
Algorithm Hash digest
SHA256 b60ff91c841e0315f3a7d2954e65446148f546bae42a54c98c82f0bb52929909
MD5 58453a5c6e95c21bac89f55557d85235
BLAKE2b-256 ef65cc7272b0c3261cfb1229c6286d230fb46fe4bcc58684bc2e489d4eb7486d

See more details on using hashes here.

File details

Details for the file tei_rdfa-0.1.0b3-py3-none-any.whl.

File metadata

  • Download URL: tei_rdfa-0.1.0b3-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tei_rdfa-0.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 678d3ae53a8a0a079d462dca9324fa0d6cc2649ea3ed4f5c4c7ba4984898dfd0
MD5 83e4d8ab22ceb880b11fdd44de9adb39
BLAKE2b-256 e466969801a90ba25d0461c6a9b68faaa8d23f9c4bad58d6b3799d6dbdc83ffa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page