Skip to main content

No project description provided

Project description

rdfdf

pipeline status PyPI version License: GPL v3

rdfdf - Functionality for rule-based pandas.DataFrame - rdflib.Graph conversion.

For representation of tabular data in RDF see Allemang, Hendler: Semantic Web for the Working Ontologist. 2011, 40ff.

This project is in an early stage of development and should be used with caution.

Requirements

  • python >= 3.10

Usage

For now rdfdf provides a DFGraphConverter class for rule-based pandas.DataFrame to rdflib.Graph conversion.

Template-based conversion functionality might be available in the future.

DFGraphConverter

Unlike rdfpandas which requires URIRefs as column headers (and otherwise just creates invalid RDF with e.g. literals as predicates), rdfdf computes URIRefs (or Literals for triple objects) based on rules.

DFGraphConverter iterates over a dataframe and constructs RDF triples by constructing a generator of subgraphs ('field graphs') and then merging all subgraphs with an rdflib.Graph component.

Subgraphs are generated by

  • for every row
    • for every rule in column_rules
      • looking up the column_rules key for the current row and calling the corresponding column_rules value.

column_rules values must be callables which are responsible for generating and returning a graph for merging. Note that rules actually don't need to return an instance of rdflib.Graph (e.g. if a rule just accesses DFgraphConverter.store; for an example of state sharing between rules see below), in which case the result is skipped in the generator.

column_rules values must be callables of arity 3; for every field for which a rule applies

  • the subject field value (specified in the subject_column parameter and possibly computed by subject_rule of DFGraphConverter),
  • the object field value (i.e. the current field value) and
  • DFGraphConverter.store (a class level attribute for state sharing between rules and DFGraphConverter instances)

get passed to the respective rule callable (see examples below).

Parameters:

  • dataframe: A pandas.DataFrame to be converted.

  • subject_column: Selects a table column by name to be regarded as the column of triple subjects.

  • subject_rule: Optional; either a Callable[[str], URIRef] or an rdflib.Namespace which gets applied to every field of the subject_column; if supplied, subject_field in the column_rules will be what subject_rule computes it to be; otherwise subject_field will be just the raw field value of the current subject_column and must be handled manually in order to be become a valid triple subject (i.e. a URIRef).

  • column_rules: A mapping of column names to callables responsible for creating subgraphs ('field graphs').

  • graph: Optional; allows to set the internal rdflib.Graph component.

Examples:

Simple example

import pandas as pd

from rdfdf.rdfdf import DFGraphConverter

from rdflib import URIRef, Graph, Namespace, Literal
from rdflib.namespace import RDF


# namespace definitions
CRM = Namespace("http://www.cidoc-crm.org/cidoc-crm/")

# bind namespace to graph component
nsgraph = Graph()
nsgraph.bind("crm", CRM)

# create a simple dataframe
table = [
    {
        "id": "rem",
        "full_title": "Reference corpus Middle High German"
    }
]

df = pd.DataFrame(data=table)


# rules
def full_title_rule(subject_field, object_field, store):

    title_uri = URIRef(f"https://{subject_field}.clscor.io/entity/corpus/title/full")
    corpus_uri = URIRef(f"https://{subject_field}.clscor.io/entity/corpus")

    triples = [
        (
            title_uri,
            RDF.type,
            CRM.E41_Appellation
        ),
        (
            title_uri,
            CRM.P1_identifies,
            corpus_uri
        ),
        # inverse
        (
            corpus_uri,
            CRM.P1_is_identified_by,
            title_uri
        ),
        (
            title_uri,
            CRM.P2_has_type,
            URIRef("https://core.clscor.io/entity/type/title/full")
        ),
        (
            title_uri,
            CRM['190_has_symbolic_content'],
            Literal(object_field)
        ),
    ]

    graph = Graph()

    for triple in triples:
        graph.add(triple)

    return graph


column_rules = {
    "full_title": full_title_rule
}


dfgraph = DFGraphConverter(
    dataframe=df,
    subject_column="id",
    column_rules=column_rules,
    graph=nsgraph
)

graph = dfgraph.to_graph()
print(graph.serialize(format="ttl"))

Output:

@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .

<https://rem.clscor.io/entity/corpus> crm:P1_is_identified_by <https://rem.clscor.io/entity/corpus/title/full> .

<https://rem.clscor.io/entity/corpus/title/full> a crm:E41_Appellation ;
    crm:190_has_symbolic_content "Reference corpus Middle High German" ;
    crm:P1_identifies <https://rem.clscor.io/entity/corpus> ;
    crm:P2_has_type <https://core.clscor.io/entity/type/title/full> .

More involved example

For a more involved application of rdfdf (including extensive state sharing between rules) see the CorTab script.

Contribution

Please feel free to open issues or pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfdf-0.1.8.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

rdfdf-0.1.8-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file rdfdf-0.1.8.tar.gz.

File metadata

  • Download URL: rdfdf-0.1.8.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.3.2-arch1-1

File hashes

Hashes for rdfdf-0.1.8.tar.gz
Algorithm Hash digest
SHA256 e3ee566e441455bd0a2724c58fd88bda62b911367d979b58178fe1ffe4196ebb
MD5 df85ab826e00bfe3dc60c80ccd191303
BLAKE2b-256 7f3314d5898d71f54f3379aee6a64deed88512681dd5dac565cc88331b32ce3d

See more details on using hashes here.

File details

Details for the file rdfdf-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: rdfdf-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.3.2-arch1-1

File hashes

Hashes for rdfdf-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b8572a3e9d5626fb4d746b839f618ad17c3cb41063f8adaadb067bd8d8dfe9e0
MD5 69be0859a641bfe4cb483418157f7489
BLAKE2b-256 84779f476c8f45b35cfd8774ff8170220bbc0130d312f7473dce7a1623a8670f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page