A python library for transforming RDF data from one form to another.
Project description
telicent-rdf-transform
A python library for transforming RDF data from one form to another; in most cases: data from one ontology to another. Configuration of this transformation is provided as a series of SPARQL queries. This library has been built so it can be used in multiple contexts: streaming, batch files, APIs, etc.
Within this repo we provide the library (see /rdf_transform) and examples (see /examples) demonstrating different use cases.
Features
- Generic & Configurable: Define 1-to-many SPARQL queries via YAML configuration
- Ontology Mapping Support: Optional resource-to-resource mapping files utilising
owl:equivalentProperty/owl:equivalentClass - Ordered Execution: Queries run in specified order with individual enable/disable controls
- Multiple RDF Formats: Supports Turtle, N-Triples, JSON-LD, RDF/XML, N-Quads, and TriG
- Performance Monitoring: Detailed timing metrics for all operations
Quick Start
Prerequisites
- Python 3.12 or higher
Installation
# Run the development setup script
./dev_setup.sh
# Activate the virtual environment
source .venv/bin/activate
Core Library Usage (Python API)
Use the transformation logic directly in any Python application:
from rdf_transform import MapperConfig, load_mapping_graph, transform_rdf
# Load configuration
config = MapperConfig.from_yaml("config.example.yaml")
# Optionally load ontology mappings
mapping = load_mapping_graph("mappings.example.ttl", "text/turtle", config)
# Read input RDF
with open("input.ttl", "rb") as f:
input_data = f.read()
# Transform
output_data, metrics = transform_rdf(
input_data=input_data,
input_format="text/turtle",
output_format="text/turtle",
config=config,
mapping_graph=mapping
)
# Use the results
print(f"Transformed {metrics['input_triples']} → {metrics['output_triples']} triples")
with open("output.ttl", "wb") as f:
f.write(output_data)
Configuration
Create a YAML config file to define your transformation.
Queries
The core of the configuration is a list of SPARQL queries to execute. Both CONSTRUCT and UPDATE (DELETE/INSERT) queries are supported:
queries:
- name: "transform_properties" # unique identifier (required)
description: "Map old props to new" # optional commentary on the purpose of query. This is not used in code and is only explanatory
enabled: true # set to false to skip (default: true)
order: 1 # execution order, lower runs first (default: 0)
query: |
CONSTRUCT {
?s ?targetProp ?o .
}
WHERE {
?s ?sourceProp ?o .
?sourceProp owl:equivalentProperty ?targetProp .
}
Queries are sorted by order and only enabled queries are executed. Queries execute in order regardless of type - UPDATE queries modify the working graph in place, while CONSTRUCT query results are merged into the final output.
Mapping File (Optional)
For simple 1:1 mappings between RDF resources (e.g., mapping one class or property URI to another), you can define these maps using a separate RDF file which utilises owl:equivalentProperty or owl:equivalentClass assertions.
When provided, the mapping file is temporarily merged with the input data before queries run, allowing your SPARQL queries to reference these relationships. These mapping triples are later, automatically removed from the final output.
mapping_file: ontology_mappings.ttl
mapping_file_format: text/turtle # default if not specified
An example of a mapping file is provided below:
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix old: <http://example.org/old-ontology#> .
@prefix new: <http://example.org/new-ontology#> .
# Class mappings
old:Person owl:equivalentClass new:Human .
old:Company owl:equivalentClass new:Organization .
# Property mappings
old:hasName owl:equivalentProperty new:name .
old:worksFor owl:equivalentProperty new:employedBy .
This is useful when you want to keep your mapping definitions separate from your SPARQL queries, or when the same mappings are shared across multiple configurations.
Namespaces (Optional)
Namespace prefixes which shall be used in the subsequently defined SPARQL queries. For RDF serializations that support prefixes (e.g. text/turtle), these will be used as PREFIX declarations. Note, you can still make prefix declarations in the SPARQL queries.
namespaces:
rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdfs: "http://www.w3.org/2000/01/rdf-schema#"
xsd: "http://www.w3.org/2001/XMLSchema#"
Error Handling
The library raises exceptions in the following cases:
- Invalid RDF syntax:
rdflib.exceptions.ParserErrorwith details about the parsing failure - Empty input:
ValueErrorif the input graph contains no triples - Empty output:
ValueErrorif all triples are filtered out after transformation - Missing mapping file:
FileNotFoundErrorif the configured mapping file path doesn't exist - Invalid SPARQL:
rdflibexceptions with query syntax error details
Queries that produce no results are not errors - they simply contribute no triples to the output.
For detailed configuration examples and SPARQL patterns, see docs/MAPPING_GUIDE.md.
How it Works
When transform_rdf() is called, the following steps occur:
-
Parse input - The input RDF data is parsed into a working graph.
-
Merge mapping file - If a mapping file is configured, its
owl:equivalentPropertyandowl:equivalentClasstriples are added to the working graph. This allows your SPARQL queries to reference theseowl:equivalentProperty/owl:equivalentClassrelationships. -
Execute queries - Enabled queries run in
ordersequence:- UPDATE queries (DELETE/INSERT) modify the working graph in place
- CONSTRUCT queries generate new triples that are collected into a separate output graph
-
Remove mapping triples - The mapping file triples are removed from the output, so they don't pollute your transformed data.
-
Serialize output - The final graph is serialized to the requested output format.
The working graph approach means UPDATE queries can prepare data for subsequent CONSTRUCT queries, enabling multi-stage transformations.
Glossary
| Term | Definition |
|---|---|
| Working graph | The in-memory RDF graph containing input data merged with mapping triples. UPDATE queries modify this graph. |
| Mapping graph | RDF graph loaded from the mapping file, containing owl:equivalentProperty and owl:equivalentClass assertions. |
| Output graph | The final RDF graph containing results from CONSTRUCT queries (or the modified working graph if only UPDATE queries are used). |
| Triple | A single RDF statement consisting of subject, predicate, and object. |
Examples
The /examples directory contains four examples demonstrating different use cases:
basic_mapper
A minimal example showing single-file transformation. Good starting point to understand the library.
cd examples/basic_mapper
python run.py
See examples/basic_mapper/README.md for details.
batch_file_mapper
Demonstrates batch processing of multiple RDF files in a directory using the transform_directory utility.
cd examples/batch_file_mapper
python transform_countries.py
See examples/batch_file_mapper/README.md for details.
ies4_to_iesnext_mapper
A work-in-progress configuration for transforming IES4 ontology data to IES Next. Includes comprehensive SPARQL queries, mapping files, and test fixtures. The configuration for this mapper is being actively developed alongside the development of the IES Next ontology stack.
See examples/ies4_to_iesnext_mapper/ for the configuration and test files.
telicent_mapper
To be used with the Telicent CORE platform for real-time transformation of RDF streams.
Run
cd examples/telicent_mapper
cp example.env .env
# Edit .env with your Kafka settings
python -m telicent_mapper.mapper
Environment Variables
| Variable | Description | Default |
|---|---|---|
MAPPER_NAME |
Name of the mapper | rdf_transform_mapper |
BOOTSTRAP_SERVERS |
Kafka bootstrap servers | Required |
SOURCE_TOPIC |
Input Kafka topic | knowledge |
TARGET_TOPIC |
Output Kafka topic | knowledge |
CONFIG_PATH |
Path to YAML config | mapping-config.yaml |
RDF_OUTPUT_FORMAT |
Output format (MIME type) | text/turtle |
SECURITY_LABEL_AND_GROUP |
Security label and_group | urn:telicent:groups:datasets:mapped |
See examples/telicent_mapper/ for configuration and test files.
Architecture
The project is organized into a core library package:
rdf_transform/ # Core transformation library (rdflib only)
├── __init__.py # Public API exports
├── config.py # Configuration models (MapperConfig, SPARQLQuery)
├── transform.py # Transformation functions (transform_rdf, etc.)
└── formats.py # Format utilities (RDF MIME type handling)
See docs/ARCHITECTURE.md for detailed architecture documentation with diagrams, data flows, and design patterns.
Development
Running Tests
# Run all tests (core library + examples)
pytest
# Run with verbose output
pytest -v
# Run specific test directories
pytest tests/ # Core library tests
pytest examples/ies4_to_iesnext_mapper/ # IES4 mapper tests
pytest examples/telicent_mapper/tests/ # Telicent mapper tests
# Run specific test file
pytest tests/test_transform.py
# Run specific test class or function
pytest tests/test_transform.py::TestTransformRDF
pytest tests/test_transform.py::TestTransformRDF::test_basic_transform
Code Quality
# Linting and formatting
ruff check .
# Type checking
mypy rdf_transform/
# Pre-commit hooks
pre-commit run --all-files
Documentation
- docs/ARCHITECTURE.md - Architecture diagrams, structure, and design patterns
- docs/MAPPING_GUIDE.md - SPARQL patterns and advanced mapping techniques
API Reference
Core Functions
| Function | Description |
|---|---|
transform_rdf(input_data, input_format, output_format, config, mapping_graph) |
Main entry point. Transforms RDF data and returns (output_bytes, metrics) |
load_mapping_graph(source, format, config) |
Load ontology mappings from a file path or URL |
parse_rdf(data, format, config) |
Parse RDF data into an rdflib Graph |
serialize_rdf(graph, output_format) |
Serialize a Graph to bytes |
Configuration Classes
| Class | Description |
|---|---|
MapperConfig |
Main configuration container. Use MapperConfig.from_yaml(path) to load |
SPARQLQuery |
Individual query definition with name, query, enabled, and order fields |
Exceptions
ValueError: Raised when input graph is empty or output graph is empty after transformationFileNotFoundError: Raised when mapping file path cannot be foundrdflib.exceptions.ParserError: Raised when RDF parsing fails (invalid syntax)
License
Copyright Telicent Ltd. All rights reserved.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file telicent_rdf_transform-0.1.0.tar.gz.
File metadata
- Download URL: telicent_rdf_transform-0.1.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45d48a274f5b06faee0b8ffd1fafbc575a7759b10c6d7f672f301fde068e0444
|
|
| MD5 |
a961e9ca8228da9f5d33c89c6c02333f
|
|
| BLAKE2b-256 |
5bc57f4aeb9b43c1658b533aca576000a817a02679584bb7d46795b2daa1f3c1
|
Provenance
The following attestation bundles were made for telicent_rdf_transform-0.1.0.tar.gz:
Publisher:
publish.yml on telicent-oss/telicent-rdf-transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
telicent_rdf_transform-0.1.0.tar.gz -
Subject digest:
45d48a274f5b06faee0b8ffd1fafbc575a7759b10c6d7f672f301fde068e0444 - Sigstore transparency entry: 1362111200
- Sigstore integration time:
-
Permalink:
telicent-oss/telicent-rdf-transform@1fdd51d58fd305ed6dda6a8b2b385bec8a8fd1e0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/telicent-oss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1fdd51d58fd305ed6dda6a8b2b385bec8a8fd1e0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file telicent_rdf_transform-0.1.0-py3-none-any.whl.
File metadata
- Download URL: telicent_rdf_transform-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ef86bc7b083ef51aa7171a290fe82e1ef35c9e7f9f24409a7f818d140f38875
|
|
| MD5 |
6a7c902acb05c54417f24c296de4ed11
|
|
| BLAKE2b-256 |
c0b77de658a13f2bf6c48fcd97a619d2e0ad9dc78d5431db748978a5495f603f
|
Provenance
The following attestation bundles were made for telicent_rdf_transform-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on telicent-oss/telicent-rdf-transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
telicent_rdf_transform-0.1.0-py3-none-any.whl -
Subject digest:
4ef86bc7b083ef51aa7171a290fe82e1ef35c9e7f9f24409a7f818d140f38875 - Sigstore transparency entry: 1362111326
- Sigstore integration time:
-
Permalink:
telicent-oss/telicent-rdf-transform@1fdd51d58fd305ed6dda6a8b2b385bec8a8fd1e0 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/telicent-oss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1fdd51d58fd305ed6dda6a8b2b385bec8a8fd1e0 -
Trigger Event:
push
-
Statement type: