Skip to main content

Dataframe-based interactive knowledge graph construction using stOTTR templates

Project description

maplib

Dataframe-based interactive knowledge graph construction, validation and enrichment using stOTTR templates and SPARQL. The library is especially suitable for problems that arise when building industrial knowledge graphs. The library is a Rust-based implementation of stOTTR. Implemented with Apache Arrow in Rust using Pola.rs, with a Python wrapper.

There is an associated paper [1] that can be found here.

Mapping

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

ex = "https://github.com/DataTreehouse/maplib/example#"
co = "https://github.com/DataTreehouse/maplib/countries#"
pi = "https://github.com/DataTreehouse/maplib/pizza#"
ing = "https://github.com/DataTreehouse/maplib/pizza/ingredients#"

import polars as pl
df = pl.DataFrame({"p":[pi + "Hawaiian", pi + "Grandiosa"],
                   "c":[co + "CAN", co + "NOR"],
                   "is": [[ing + "Pineapple", ing + "Ham"],
                          [ing + "Pepper", ing + "Meat"]]})
df

That is, our DataFrame is:

p c is
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/countries#CAN" [".../ingredients#Pineapple", ".../ingredients#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/countries#NOR" [".../ingredients#Pepper", ".../ingredients#Meat"]

Then we can define a stOTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Mapping
pl.Config.set_fmt_str_lengths(150)

doc = """
@prefix pizza:<https://github.com/DataTreehouse/maplib/pizza#>.
@prefix xsd:<http://www.w3.org/2001/XMLSchema#>.
@prefix ex:<https://github.com/DataTreehouse/maplib/pizza#>.

ex:Pizza[?p, xsd:anyURI ?c, List<xsd:anyURI> ?is] :: {
ottr:Triple(?p, a, pizza:Pizza),
ottr:Triple(?p, pizza:fromCountry, ?c),
cross | ottr:Triple(?p, pizza:hasIngredient, ++?is)
}.
"""

m = Mapping([doc])
m.expand("ex:Pizza", df)

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pizza:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pizza:Pizza .
?p pizza:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../ingredients#Meat"
"https://.../pizza#Grandiosa" "https://.../ingredients#Pepper"
"https://.../pizza#Hawaiian" "https://.../ingredients#Pineapple"
"https://.../pizza#Hawaiian" "https://.../ingredients#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pizza:<https://github.com/DataTreehouse/maplib/pizza#>
PREFIX ing:<https://github.com/DataTreehouse/maplib/pizza/ingredients#>
CONSTRUCT { ?p a pizza:UnorthodoxPizza } 
WHERE {
    ?p a pizza:Pizza .
    ?p pizza:hasIngredient ing:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the mapping state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pizza:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pizza:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The Python API is documented here

Installing

The package is published on PyPi:

pip install maplib

References

[1] M. Bakken, "maplib: Interactive, literal RDF model mapping for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.7.6.tar.gz (230.3 kB view hashes)

Uploaded Source

Built Distributions

maplib-0.7.6-cp311-none-win_amd64.whl (14.5 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

maplib-0.7.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

maplib-0.7.6-cp311-cp311-macosx_11_0_arm64.whl (12.3 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

maplib-0.7.6-cp311-cp311-macosx_10_12_x86_64.whl (13.7 MB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

maplib-0.7.6-cp310-none-win_amd64.whl (14.5 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

maplib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

maplib-0.7.6-cp310-cp310-macosx_11_0_arm64.whl (12.3 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

maplib-0.7.6-cp310-cp310-macosx_10_12_x86_64.whl (13.7 MB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

maplib-0.7.6-cp39-none-win_amd64.whl (14.5 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

maplib-0.7.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

maplib-0.7.6-cp39-cp39-macosx_11_0_arm64.whl (12.3 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

maplib-0.7.6-cp39-cp39-macosx_10_12_x86_64.whl (13.7 MB view hashes)

Uploaded CPython 3.9 macOS 10.12+ x86-64

maplib-0.7.6-cp38-none-win_amd64.whl (14.5 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

maplib-0.7.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

maplib-0.7.6-cp38-cp38-macosx_11_0_arm64.whl (12.3 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

maplib-0.7.6-cp38-cp38-macosx_10_12_x86_64.whl (13.7 MB view hashes)

Uploaded CPython 3.8 macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page