Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.10.tar.gz (364.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.10-cp310-abi3-win_amd64.whl (28.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.10-cp310-abi3-manylinux_2_28_x86_64.whl (27.2 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.10-cp310-abi3-manylinux_2_28_aarch64.whl (25.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.10-cp310-abi3-macosx_11_0_arm64.whl (24.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.10-cp310-abi3-macosx_10_12_x86_64.whl (25.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.10.tar.gz.

File metadata

  • Download URL: maplib-0.20.10.tar.gz
  • Upload date:
  • Size: 364.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.10.tar.gz
Algorithm Hash digest
SHA256 8581f73f9e89595d6461ab9f9038349889fa34f1cb417e7b31872a28209bb857
MD5 14d270f05cd03ed14e22a62364e012f0
BLAKE2b-256 0a3770db7146a9fc4875fd690560290b187df8028b9ecbea085e4c2acb0e4592

See more details on using hashes here.

File details

Details for the file maplib-0.20.10-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.10-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.10-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ff7d36b3e7ace7a7c13bb449174f0ea57eb3bc03de0fbe97a88617ce0808ddd8
MD5 12e6da07e6f2a59d1c866b914b5a4b50
BLAKE2b-256 5990f6bea4fa25d51650b8ea23a56b2946d80350a0f965a27456dba94e789775

See more details on using hashes here.

File details

Details for the file maplib-0.20.10-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.10-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 10497d4e7abdbf50b1b5d268eb4b69f2f3283ee2010d4aea95caea31981d7e6c
MD5 e135f8b7c8d8486f2627cb604a38a020
BLAKE2b-256 e28fba9e2522aa5d36fcd5ce37f917cedb7c8d98efc466debde0e0068296be45

See more details on using hashes here.

File details

Details for the file maplib-0.20.10-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.10-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9bd95c0ffd120bb9807a6085ae6996b2b65c76878b81364958e94302110cd480
MD5 563aceb5893ccf85302ed03c0b4ffb85
BLAKE2b-256 b7208b926fc0dcd403e1f366b57d086e3205042110ea235d646e1f70a36960d5

See more details on using hashes here.

File details

Details for the file maplib-0.20.10-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.10-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b25b1d40cd3444e21a46b4cf39760c40039e048de6a3ceb92c6e3656d89a8bc
MD5 108b61dfffd337ad298d0f8272d3710f
BLAKE2b-256 042f407edac610e134aebab592d44140f44bfddabce03022ee5111b0b93a2325

See more details on using hashes here.

File details

Details for the file maplib-0.20.10-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.10-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8fe7e74f58b700cde4fca2dc1802693654baf0a6f3e90f9ae0846c45cd8d75ee
MD5 d1d6cbbd9291d2c8c7347939da1dec0a
BLAKE2b-256 ac9670b0c9772cf3f88aa3497d185aa94643b64dbcea82a07e20b9914a7db85a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page