No project description provided
Project description
rdfdf
rdfdf - Functionality for rule-based pandas.DataFrame
- rdflib.Graph
conversion.
For representation of tabular data in RDF see Allemang, Hendler: Semantic Web for the Working Ontologist. 2011, 40ff.
This project is in an early stage of development and should be used with caution.
Requirements
- python >= 3.10
Usage
For now rdfdf provides a DFGraphConverter
class for rule-based pandas.DataFrame
to rdflib.Graph conversion
.
Template-based conversion functionality might be available in the future.
DFGraphConverter
Unlike rdfpandas which requires URIRefs as column headers (and otherwise just creates invalid RDF with e.g. literals as predicates), rdfdf computes URIRefs (or Literals for triple objects) based on rules.
DFGraphConverter
iterates over a dataframe and constructs RDF triples by constructing a generator of subgraphs ('field graphs') and then merging all subgraphs with an rdflib.Graph
component.
Subgraphs are generated by
- for every row
- for every rule in
column_rules
- looking up the
column_rules
key for the current row and calling the correspondingcolumn_rules
value.
- looking up the
- for every rule in
column_rules
values must be callables which are responsible for generating and returning a graph for merging. Note that rules actually don't need to return an instance of rdflib.Graph
(e.g. if a rule just accesses DFgraphConverter.store
; for an example of state sharing between rules see below), in which case the result is skipped in the generator.
column_rules
values must be callables of arity 3;
for every field for which a rule applies
- the subject field value (specified in the
subject_column
parameter and possibly computed bysubject_rule
ofDFGraphConverter
), - the object field value (i.e. the current field value) and
DFGraphConverter.store
(a class level attribute for state sharing between rules andDFGraphConverter
instances)
get passed to the respective rule callable (see examples below).
Parameters:
-
dataframe: A pandas.DataFrame to be converted.
-
subject_column: Selects a table column by name to be regarded as the column of triple subjects.
-
subject_rule: Optional; either a
Callable[[str], URIRef]
or anrdflib.Namespace
which gets applied to every field of the subject_column; if supplied,subject_field
in thecolumn_rules
will be whatsubject_rule
computes it to be; otherwisesubject_field
will be just the raw field value of the currentsubject_column
and must be handled manually in order to be become a valid triple subject (i.e. a URIRef). -
column_rules: A mapping of column names to callables responsible for creating subgraphs ('field graphs').
-
graph: Optional; allows to set the internal rdflib.Graph component.
Examples:
Simple example
import pandas as pd
from rdfdf.rdfdf import DFGraphConverter
from rdflib import URIRef, Graph, Namespace, Literal
from rdflib.namespace import RDF
# namespace definitions
CRM = Namespace("http://www.cidoc-crm.org/cidoc-crm/")
# bind namespace to graph component
nsgraph = Graph()
nsgraph.bind("crm", CRM)
# create a simple dataframe
table = [
{
"id": "rem",
"full_title": "Reference corpus Middle High German"
}
]
df = pd.DataFrame(data=table)
# rules
def full_title_rule(subject_field, object_field, store):
title_uri = URIRef(f"https://{subject_field}.clscor.io/entity/corpus/title/full")
corpus_uri = URIRef(f"https://{subject_field}.clscor.io/entity/corpus")
triples = [
(
title_uri,
RDF.type,
CRM.E41_Appellation
),
(
title_uri,
CRM.P1_identifies,
corpus_uri
),
# inverse
(
corpus_uri,
CRM.P1_is_identified_by,
title_uri
),
(
title_uri,
CRM.P2_has_type,
URIRef("https://core.clscor.io/entity/type/title/full")
),
(
title_uri,
CRM['190_has_symbolic_content'],
Literal(object_field)
),
]
graph = Graph()
for triple in triples:
graph.add(triple)
return graph
column_rules = {
"full_title": full_title_rule
}
dfgraph = DFGraphConverter(
dataframe=df,
subject_column="id",
column_rules=column_rules,
graph=nsgraph
)
graph = dfgraph.to_graph()
print(graph.serialize(format="ttl"))
Output:
@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
<https://rem.clscor.io/entity/corpus> crm:P1_is_identified_by <https://rem.clscor.io/entity/corpus/title/full> .
<https://rem.clscor.io/entity/corpus/title/full> a crm:E41_Appellation ;
crm:190_has_symbolic_content "Reference corpus Middle High German" ;
crm:P1_identifies <https://rem.clscor.io/entity/corpus> ;
crm:P2_has_type <https://core.clscor.io/entity/type/title/full> .
More involved example
For a more involved application of rdfdf
(including extensive state sharing between rules) see the CorTab script.
Contribution
Please feel free to open issues or pull requests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.