Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD, HDT)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Contributing

Any contributions will fall under the Apache 2.0 license of maplib.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.19.tar.gz (404.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.19-cp310-abi3-win_amd64.whl (28.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.19-cp310-abi3-manylinux_2_28_x86_64.whl (27.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.19-cp310-abi3-manylinux_2_28_aarch64.whl (25.4 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.19-cp310-abi3-macosx_11_0_arm64.whl (24.2 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.19-cp310-abi3-macosx_10_12_x86_64.whl (25.8 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.19.tar.gz.

File metadata

  • Download URL: maplib-0.20.19.tar.gz
  • Upload date:
  • Size: 404.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.19.tar.gz
Algorithm Hash digest
SHA256 d6dbb29b4c4921171295e40db3a07b0b404dcc7eceb14ab34dfe38c35ef0409c
MD5 c39869d395dcfb369065d940c465bca7
BLAKE2b-256 07246a8cff7751353f4004a68534c7296bc4e5a057f291fe599de8f9768aab67

See more details on using hashes here.

File details

Details for the file maplib-0.20.19-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.19-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.19-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f61879c8cff4fa715a4066559f1e835c4453a85c0fec6545b83be58a9812d7e5
MD5 d723e48623d5b44af1f119f8e39e9146
BLAKE2b-256 ab3c196d51f4106c046c0b1f93bec22bc02cf879558bb6070e88a8ae8a48b512

See more details on using hashes here.

File details

Details for the file maplib-0.20.19-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.19-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 707a820b572a9043d759411a54cc5e0eae81ad3e9c8975af1a58432ea5c7cdec
MD5 56aadc6ac20b29d3047dd3e6ef494ca6
BLAKE2b-256 91763d851c0eb97de5e1d1cb0f2259f6b5468d25ec36546eec6014323a46cf54

See more details on using hashes here.

File details

Details for the file maplib-0.20.19-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.19-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 285308c875dd292dfe5c0087d549008191f8a257b91757523003902d6c0c71a4
MD5 aa63fb781daf03502c9a7b2a018f3944
BLAKE2b-256 e75fd085842acfa5d93f64d2fbd33b991a7d919857128516d42aac4e51af05e7

See more details on using hashes here.

File details

Details for the file maplib-0.20.19-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.19-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 33a1443e5f1d8b3993465f0e86be554887df2a147364850379c5eb2a12330937
MD5 7f1c3db8b695f65624350e020efc87dc
BLAKE2b-256 41dcbc2b64b9fe6ea73b4d72ec0bcc540d033f8b1489532f437e27b176cca803

See more details on using hashes here.

File details

Details for the file maplib-0.20.19-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.19-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ca17fa583287fbbaf946a326af0b23fc9c51778a0f7b0f9c450d4f16f984f4e0
MD5 f3e47646305ec8db9ba7c8aec7a40c01
BLAKE2b-256 6cff79b1dc0cbf16d13eda2fc664f4d8983e2723d589a4ef9555227feb6183ea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page