Skip to main content

No project description provided

Project description

<img src="lodkit.png" width=10% height=10%>

TabulaRDF

License: GPL v3 PyPI version

TabulaRDF - Functionality for DataFrame to RDF conversions.

Although TabulaRDF was primarily designed for table to RDF conversions, the TemplateConverter class should be general enough to allow conversions to basically any target format.

Just like the TemplateGraphConverter class parses renderings into an rdflib.Graph instance, renderings could e.g. also get parsed into an lxml.etree.

Requirements

  • python >= 3.11

Usage

TabulaRDF provides two main approaches for table conversions, a template-based approach using the Jinja2 templating engine and a pure Python/callable-based approach.

Also a CLI for template conversions is available, see TaCL below.

Template converters

Template converters are based on the generic TemplateConverter class which allows to iterate over a dataframe and pass table data to Jinja renderings.

Two different render strategies are available through the render method and the render_by_row method respectively.

  • With the render method, every template gets passed the entire table data as "table_data"; this means that iteration must be done in the template.
  • With the render_by_row method, for every row iteration the template gets passed the current row data (as "row_data") only; so iteration is done at the Python level, not in the template.

The TemplateGraphConverter class uses the render_by_row method and parses renderings into an rdflib.Graph instance.

import pandas as pd

from jinja2 import Template
from tabulardf import TemplateGraphConverter

table = [
    {
        "id": "rem",
        "full_title": "Reference corpus Middle High German"
    },
    {
        "id": "SweDracor",
        "full_title": "Swedish Drama Corpus"
    }
]

dataframe = pd.DataFrame(data=table)

template = Template(
    """
    @prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

    {% set acronym_lower = row_data['id'] | lower %}

    <https://{{acronym_lower}}.clscor.io/entity/appellation/1> a crm:E41_Appellation ;
        crm:P2_has_type <https://core.clscor.io/entity/type/appellation_type/full_title> ;
        rdf:value "{{row_data['full_title']}}" .
    """
)

converter = TemplateGraphConverter(
    dataframe=dataframe,
    template=template
)

print(converter.serialize())

Output:

@prefix crm: <http://www.cidoc-crm.org/cidoc-crm/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<https://rem.clscor.io/entity/appellation/1> a crm:E41_Appellation ;
    crm:P2_has_type <https://core.clscor.io/entity/type/appellation_type/full_title> ;
    rdf:value "Reference corpus Middle High German" .

<https://swedracor.clscor.io/entity/appellation/1> a crm:E41_Appellation ;
    crm:P2_has_type <https://core.clscor.io/entity/type/appellation_type/full_title> ;
    rdf:value "Swedish Drama Corpus" .

This is not a simple text rendering (note that the prefix declarations are not repeated) but an rdflib serialization! TemplateGraphConverter.serialize is a proxy for rdflib.Graph.serialze, so any serialization format can be generated.

Callable converters

TabulaRDF provides two main approaches for pure Python/callable based table to RDF conversions, the RowGraphConverter class and FieldGraphConverter class.

RowGraphConverter takes a dataframe and a Python callable which takes a dict parameter and is responsible for returning a graph instance; for every row iteration over the dataframe this callable gets passed the row data as a dictionary; the generated subgraphs ("row graphs") are merged into a main graph.

import pandas as pd

from jinja2 import Template
from tabulardf import RowGraphConverter

from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import RDF

table = [
    {
        "id": "rem",
        "full_title": "Reference corpus Middle High German"
    },
    {
        "id": "SweDracor",
        "full_title": "Swedish Drama Corpus"
    }
]

dataframe = pd.DataFrame(data=table)


def row_rule(row_data: dict) -> Graph:
    crm = Namespace("http://www.cidoc-crm.org/cidoc-crm/")
    subject_uri = URIRef(f"https://{row_data['id'].lower()}.clscor.io/entity/appellation/1")

    triples = [
        (
            subject_uri,
            RDF.type,
            crm["E41_Appellation"]
        ),
        (
            subject_uri,
            crm["P2_has_type"],
            URIRef("https://core.clscor.io/entity/type/appellation_type/full_title")
        ),
        (
            subject_uri,
            RDF.value,
            Literal(row_data["full_title"])
        )
    ]

    graph = Graph()

    for triple in triples:
        graph.add(triple)

    return graph


converter = RowGraphConverter(
    dataframe=dataframe,
    row_rule=row_rule)

print(converter.serialize())

FieldGraphConverter on the other hand iterates over every field for every row in a dataframe; it applies callables to every field according to a mapping of column header names and callables responsible for returning a subgraph per field ("field graphs") which are then merged into a main graph. Callables in such are rule mapping are of arity 3, they receive

  • subject_field (according to the FieldGraphConverter's subject_column parameter),
  • object_field (i.e. the value of the current field) and
  • store (a class level dictionary for caching data).
import pandas as pd

from jinja2 import Template
from tabulardf import FieldGraphConverter

from rdflib import Graph, URIRef, Literal, Namespace
from rdflib.namespace import RDF

table = [
    {
        "id": "rem",
        "full_title": "Reference corpus Middle High German"
    },
    {
        "id": "SweDracor",
        "full_title": "Swedish Drama Corpus"
    }
]

dataframe = pd.DataFrame(data=table)


def id_rule(subject_field, object_field, store) -> Graph:
    subject_uri = URIRef(f"https://{subject_field}.clscor.io/entity/appellation/1")
    crm = Namespace("http://www.cidoc-crm.org/cidoc-crm/")

    triples = [
        (
            subject_uri,
            RDF.type,
            crm["E41_Appellation"]
        ),
        (
            subject_uri,
            crm["P2_has_type"],
            URIRef("https://core.clscor.io/entity/type/appellation_type/full_title")
        )
    ]

    graph = Graph()

    for triple in triples:
        graph.add(triple)

    return graph


def full_title_rule(subject_field, object_field, store) -> Graph:
    subject_uri = URIRef(f"https://{subject_field}.clscor.io/entity/appellation/1")

    graph = Graph()
    graph.add((subject_uri, RDF.value, Literal(object_field)))

    return graph


column_rules = {
    "id": id_rule,
    "full_title": full_title_rule
}


converter = FieldGraphConverter(
    dataframe=dataframe,
    subject_column="id",
    subject_rule=str.lower,
    column_rules=column_rules)

print(converter.serialize())

If subject_rule is supplied, subject_field in a column_rule callable will be what subject_rule computes it to be. As mentioned, store is a class level attribute for sharing state between callables.

Both RowgraphConverter and FieldGraphConverter produce the same output.

TaCL

TaCL is a humble CLI for tabulaRDF template conversions. [todo: description + examples]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabulardf-0.1.1.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

tabulardf-0.1.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file tabulardf-0.1.1.tar.gz.

File metadata

  • Download URL: tabulardf-0.1.1.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.4.11-arch2-1

File hashes

Hashes for tabulardf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8b54fa5e3ab39a9d9f013b5d635c364d82348465cac11358b1a91545b2485974
MD5 1daf38d0a85595989a4a70987ae0abb4
BLAKE2b-256 02afdf0937fd3215ee166c3fac7d244a300b9de7e857fe3fe0a99365de078843

See more details on using hashes here.

File details

Details for the file tabulardf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tabulardf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.4.11-arch2-1

File hashes

Hashes for tabulardf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 741bf2f5252aa3fceb44c0cabb7f21e3cbefdf0166d43552bfd3be342487e5f5
MD5 adbced1ad30dd12e539b88a5e31c45f4
BLAKE2b-256 c545050d423034d241cedc53e79fefbafae8433eea283c9986a2313ca4bc973b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page