Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.18.tar.gz (385.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.18-cp310-abi3-win_amd64.whl (28.5 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.18-cp310-abi3-manylinux_2_28_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.18-cp310-abi3-manylinux_2_28_aarch64.whl (25.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.18-cp310-abi3-macosx_11_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.18-cp310-abi3-macosx_10_12_x86_64.whl (25.6 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.18.tar.gz.

File metadata

  • Download URL: maplib-0.20.18.tar.gz
  • Upload date:
  • Size: 385.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.18.tar.gz
Algorithm Hash digest
SHA256 6a349cc94cdbdd1d0d6ea00953d6d60777565bb1687fa30a6953cba6de42c110
MD5 f02fe517e103c493a3676c1b4e80aec8
BLAKE2b-256 3d1bc20e9bba8482da364bba0ef805b774f497b56d1bdc235203a441d705740a

See more details on using hashes here.

File details

Details for the file maplib-0.20.18-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.18-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.5 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.18-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0c21472ec598aa0b58ffb827d4dbfbbed36af9ab656e00c1a71ead36dea9d0d7
MD5 81e3ccb7217869a60b29c1e6c6613c72
BLAKE2b-256 a36afea8cd84082b6ce971eda6cebdbec6fb9635c6de298e271403eb66564999

See more details on using hashes here.

File details

Details for the file maplib-0.20.18-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.18-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c9917c6fa82c0ba1e57c17a9f2f749052d87d68a59c0ba40416dbb45cb00416d
MD5 32e61594ab2fc7a6827f9a3d3cf6974d
BLAKE2b-256 b6fac4679dde324b68791cb0de9ab9befdf8b2142a0312024ee2d9fc609f835f

See more details on using hashes here.

File details

Details for the file maplib-0.20.18-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.18-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a8f16d6fb2314a360295774c363263e0b1e36aa0d09708e6af6df70383590ae0
MD5 55c43d37d5949760ae232fb72fc88697
BLAKE2b-256 f9c21b05392ea8cc2576299cb257115c3849bdd66d905cb35560e95f98dcceb2

See more details on using hashes here.

File details

Details for the file maplib-0.20.18-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.18-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eefbe98e15427e0f04464920444d4d49310fb6cbb03ae7d02433601b879672d4
MD5 6aa540c92c3277c47c6ad9c42fa39dc1
BLAKE2b-256 309df39b9242448e3bfe447d2832c5bd9ef8277269c06fcd5be003df48d9589a

See more details on using hashes here.

File details

Details for the file maplib-0.20.18-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.18-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 638420b29e15fdd6d81fab13c92f4e90e72235ef66d18a981adf5855f0c2a8ab
MD5 88d55fb1a1b33807321e3c1f4343e953
BLAKE2b-256 3ef18bdce49ba8402d59f06b4df4c72d8fc1b688a3ffa2ab5be2d399d5b76c8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page