Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation and SPARQL-based enrichment in Python

maplib is a knowledge graph construction library for building RDF knowledge graphs using template expansion (OTTR Templates). Maplib features SPARQL- and SHACL-engines that are available as the graph is being constructed, allowing enrichment and validation. It can construct and validate knowledge graphs with millions of nodes in seconds.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph.

Template expansion is typically zero-copy and nearly instantaneous, and the built-in SPARQL and SHACL engines means you can query, inspect, enrich and validate the knowledge graph immediately.

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on our webpage if you want to try out SHACL.

Mapping

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Mapping, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix("pi", pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a(), pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Mapping()
m.expand(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the mapping state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating SHACL
  • importing triples (Turtle, RDF/XML, NTriples)
  • writing triples (NTriples)
  • creating a new Mapping object (sprout) based on queries over the current Mapping object.

The API is documented HERE

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model mapping for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.12.10.tar.gz (346.6 kB view details)

Uploaded Source

Built Distributions

maplib-0.12.10-cp312-none-win_amd64.whl (16.8 MB view details)

Uploaded CPython 3.12 Windows x86-64

maplib-0.12.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

maplib-0.12.10-cp312-cp312-macosx_11_0_arm64.whl (16.4 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

maplib-0.12.10-cp311-none-win_amd64.whl (16.8 MB view details)

Uploaded CPython 3.11 Windows x86-64

maplib-0.12.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

maplib-0.12.10-cp311-cp311-macosx_11_0_arm64.whl (16.4 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

maplib-0.12.10-cp310-none-win_amd64.whl (16.8 MB view details)

Uploaded CPython 3.10 Windows x86-64

maplib-0.12.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

maplib-0.12.10-cp310-cp310-macosx_11_0_arm64.whl (16.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

maplib-0.12.10-cp39-none-win_amd64.whl (16.8 MB view details)

Uploaded CPython 3.9 Windows x86-64

maplib-0.12.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

maplib-0.12.10-cp39-cp39-macosx_11_0_arm64.whl (16.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file maplib-0.12.10.tar.gz.

File metadata

  • Download URL: maplib-0.12.10.tar.gz
  • Upload date:
  • Size: 346.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for maplib-0.12.10.tar.gz
Algorithm Hash digest
SHA256 52d87201a00de417e45081038f1dc245b0f7ecba97ebefb3b59676b0b65788e4
MD5 272f71eb78f73b1f6397f7c4c32c9eb2
BLAKE2b-256 26b98f35c86796506ea9fe824b987d8fc75c0ebe13588d78f7f510a69cdeacc0

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp312-none-win_amd64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 180418005ca229ec943df1706f68d6ebeaac5e054c40664742e3d1b3335ba2f6
MD5 c5944556972332fdbceefa87bcdbe79d
BLAKE2b-256 0662d7d988d0fb699686d2a712c99f5483e5b44975652ceec6cd586644e3e15e

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e4757fd0114cf1762351013588eeff61f3152a8f54f275162c6bf5d035ca279
MD5 1c87dfaca2aad01eebe02b0651d4474d
BLAKE2b-256 647fb7ebb8a086ff862715861767a6ca922b5197c4a2963fe5a84051a0392df8

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6e1c03799a77e6eebf081906577adb55fc77392eb3d6eb83f004f51096a449ac
MD5 b6c295271f0208a1047b082a2c06e337
BLAKE2b-256 b05c4e2c7f5cb2419a5fe7b352d3a5478c14f8b62663bbe3273de4708b934397

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 74ec07f6eba6601c987a2c866c09696aefc589c5be3bdb5070f78d35e625ac72
MD5 9e1624068b6750b5e40f6cd4a8880a79
BLAKE2b-256 b71e6ae9e703e0c46356454a4d2d285f20ed13d17f6af34299985420bf079568

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b6986936733994bf2deeb3de901a77199e1a4b06ee2f0d1774fa760d9cd20a56
MD5 dd80b51bc67f65b2ed8696402a5ce11f
BLAKE2b-256 c237278c9dacfeb693f4d8a3cbc81bcdd7fcd0b450f6b30ca6f4dfb23e659fa7

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d3acc72704643ae6ceaee7fff277bae3de180661695319daedec8e487b6e9355
MD5 81b3d87cff4f293558372e4a0040f93b
BLAKE2b-256 e85b900aebd1f7bd4cb90a3c8a3d852443e2ef1e79ad368e7c82c00e3cbcee70

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 f25eda5330f719095f785e520bd05f3fc9efd969fe04b272c938356a9dbde73b
MD5 b48f90b8fcac2f77191dd9125a34bff0
BLAKE2b-256 4ba13bcc1a6e3a74406957db72bd5978b833e89d32b27e8eb9cbc37450fdda3d

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a86b98591be4526471e5ad8da3e6bde8027ec2fdf566969b155b117a3258ba6
MD5 6efde726f7b301f77883463f806b2157
BLAKE2b-256 e5462c0e7f145aca2a44afc634992f9b6d5d3b435330ece1d635f6be1da29ddf

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa944f328dbb945736efef21952d8fe0c0612650d163fcbb1302f94a54e25a6a
MD5 f1b5c22989ca65f32dd4d05d01993405
BLAKE2b-256 e954df75c10b5a3e88d9d0a7884370e35800f42f59613305ae29599b961bcc57

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 ec34ae52c508537d37c25ca33f75b3586b9d8c3653b7d1189965bbb156d1f618
MD5 2dd1da26d77723115ce95ffe8dcdc8eb
BLAKE2b-256 3ec589fecc6d676b142bdbc69503afa091cf9564fa79bf1e2da7bd6aa97f9afa

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6caaaff1bcf0732923a395307c93b5e6b8d0791422f4f8a251cb959bccdd5cf
MD5 990718d869cc00f71c9366b5c93bc746
BLAKE2b-256 d03a67445eb2f052266f87ef1566b9d977213ee8b312426a15a78dad6813a34d

See more details on using hashes here.

File details

Details for the file maplib-0.12.10-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.12.10-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3e65f9d840b1d44027469a2e288d7dc2d6c5538187d6f53001a97baf8e54b267
MD5 d7d1ab05991f0ed6bc34c6a1ebc1add6
BLAKE2b-256 e3bd7f445b6a439316d2169f3304924023298dbd8c04fe578745180af2a5266a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page