Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.3.tar.gz (360.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.3-cp310-abi3-win_amd64.whl (28.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.3-cp310-abi3-manylinux_2_28_x86_64.whl (27.2 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.3-cp310-abi3-manylinux_2_28_aarch64.whl (25.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.3-cp310-abi3-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.3-cp310-abi3-macosx_10_12_x86_64.whl (25.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.3.tar.gz.

File metadata

  • Download URL: maplib-0.20.3.tar.gz
  • Upload date:
  • Size: 360.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.3.tar.gz
Algorithm Hash digest
SHA256 6ef50dc750654fa1e1c7ca5af231d6b8cfbd968d0459d22fb93f3457a68aebc1
MD5 a94abf4fd9407393bfc54d81634432d2
BLAKE2b-256 e9e7c9051a9d4f5033949eaefe09d41cdb26c309f7aef3363b800162c1181764

See more details on using hashes here.

File details

Details for the file maplib-0.20.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 95e48f262e3b802997cc54f9e1ad33cf795fd0547e5dbb2274a67f0af6fda55c
MD5 193244d8c2d40fc5ee0bbaaefe3cd08e
BLAKE2b-256 19c113eb35267d28d922f3ff7698784d64e4d8fb5e73c6f5285ccb91feac5cbc

See more details on using hashes here.

File details

Details for the file maplib-0.20.3-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.3-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fe4074ff03470a8b1ad1f6c7553063e7366f3a2f764dec358ffefbb88336d0c7
MD5 0415c264a4e6e03c93f500d089fec7e6
BLAKE2b-256 d06070182e8860356b32498362658801e6c4b9924c39877a4be1b61389a6c9c2

See more details on using hashes here.

File details

Details for the file maplib-0.20.3-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.3-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 609cfda7a02cf3fccf13119a7f7deaabda950db3f882173b34e80feee855820f
MD5 6d3734908c6cff08073854cf57e6bd37
BLAKE2b-256 db4e8406c00d82ffff8110b72bed2d845e2055d50c03ec8e9873a0e19dcf61d2

See more details on using hashes here.

File details

Details for the file maplib-0.20.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a49a1ea2368044618ded041d344f09d8ade1ecf8cb843d88bc07c495b89a1c58
MD5 97777903ac7cf6dcf3b63782192891de
BLAKE2b-256 9718e2bf28de05cadda7dc63af148bdbda1976643a42450ce22b8e648cae3563

See more details on using hashes here.

File details

Details for the file maplib-0.20.3-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.3-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 acd739d7f576cb9fcfa726ee5a8bccaeb48c785b2ee811cd3bb277c89958f3b9
MD5 ab4ac1b962e75b567faa149f99fe22fc
BLAKE2b-256 727162c17887807be656d9b7ec9eea47b88cfc17e955ab39121e2becb0e2117b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page