Dataframe-based interactive knowledge graph construction
Project description
maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python
maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.
maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.
The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.
Installing
The package is published on PyPi and the API documented here:
pip install maplib
Model
We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:
import polars as pl
pl.Config.set_fmt_str_lengths(150)
pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
"p":[pi + "Hawaiian", pi + "Grandiosa"],
"c":[pi + "CAN", pi + "NOR"],
"ings": [[pi + "Pineapple", pi + "Ham"],
[pi + "Pepper", pi + "Meat"]]
})
print(df)
That is, our DataFrame is:
| p | c | ings |
|---|---|---|
| str | str | list[str] |
| "https://.../pizza#Hawaiian" | "https://.../maplib/pizza#CAN" | [".../pizza#Pineapple", ".../pizza#Ham"] |
| "https://.../pizza#Grandiosa" | "https://.../maplib/pizza#NOR" | [".../pizza#Pepper", ".../pizza#Meat"] |
Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:
from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)
p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")
template = Template(
iri= pi.suf("PizzaTemplate"),
parameters= [
Parameter(variable=p_var, rdf_type=RDFType.IRI()),
Parameter(variable=c_var, rdf_type=RDFType.IRI()),
Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
],
instances= [
Triple(p_var, a, pi.suf("Pizza")),
Triple(p_var, pi.suf("fromCountry"), c_var),
Triple(
p_var,
pi.suf("hasIngredient"),
Argument(term=ings_var, list_expand=True),
list_expander="cross")
]
)
m = Model()
m.map(template, df)
hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:HeterodoxPizza }
WHERE {
?p a pi:Pizza .
?p pi:hasIngredient pi:Pineapple .
}"""
m.insert(hpizzas)
return m
We can immediately query the mapped knowledge graph:
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")
The query gives the following result (a DataFrame):
Next, we are able to perform a construct query, which creates new triples but does not insert them.
hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza }
WHERE {
?p a pi:Pizza .
?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]
The resulting triples are given below:
| subject | verb | object |
|---|---|---|
| str | str | str |
| "https://.../pizza#Hawaiian" | "http://.../22-rdf-syntax-ns#type" | "https://.../pizza#UnorthodoxPizza" |
If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.
m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")
Indeed, we have added the triple:
| p |
|---|
| str |
| "https://github.com/DataTreehouse/maplib/pizza#Hawaiian" |
API
The API is simple, and contains only one class and a few methods for:
- expanding templates
- querying with SPARQL
- validating with SHACL
- reading JSON to triples using Façade-X
- importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
- writing triples (Turtle, RDF/XML, NTriples)
- creating a new Model from a named graph
The API is documented HERE
Roadmap of features and optimizations
Spring 2026
- SHACL Rules
- Disk based storage and internal serialization format
- Jelly
- Graph virtualization using chrontext
Roadmap is subject to changes,particularly user and customer requests.
References
There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].
[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.
[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.
Licensing
All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.
All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file maplib-0.20.0.tar.gz.
File metadata
- Download URL: maplib-0.20.0.tar.gz
- Upload date:
- Size: 357.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
064483de5dfb5fef7a4047a5bf11c3d609d7a739e233fa54d67713ec696f9a2e
|
|
| MD5 |
9faf52fe08d62c6528ff74895c3e6be3
|
|
| BLAKE2b-256 |
073883019b73d452a5fcbdb797a170c7eb6a2c8917ab15328985487374182474
|
File details
Details for the file maplib-0.20.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: maplib-0.20.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 28.6 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
294bcf01f3ea187f58c8696990fe2d52c7eac241a223612e6a387ff9dc45fff9
|
|
| MD5 |
e5db8b1e13c9029197cd4fb887832052
|
|
| BLAKE2b-256 |
698de8f64ffd4cffb60841e4a205610ee2894ccdfb093bcaae8b0ff7cebac45f
|
File details
Details for the file maplib-0.20.0-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: maplib-0.20.0-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 27.1 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ec62b62fb55d86d1482feec42a04ee83f829070f01e5176ef236d3720a2f237
|
|
| MD5 |
b8df442687b3950fefd953b1df7be672
|
|
| BLAKE2b-256 |
ec5967bf91dec8ebe1893b2bb421aafc71c4ec280b83f6d621d55f55722c9503
|
File details
Details for the file maplib-0.20.0-cp310-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: maplib-0.20.0-cp310-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 25.2 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9d70ec0a44f30b67739b6e75c97dc8066cff9aaf77ca8fe8b0efb97824cb8fe
|
|
| MD5 |
229531248f3d21dab2f2c97f9c2bf456
|
|
| BLAKE2b-256 |
23483e029cb30cb7a54e62f3fa42687d485c8cbc7a330e2aa27c4932216ca4a7
|
File details
Details for the file maplib-0.20.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: maplib-0.20.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 23.9 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb31bbce504ecfcbf142a89160b2312d232cfb4d4b6ea04349b974c0b1b7c65
|
|
| MD5 |
62f02c39efd173b68f130a0398cf6871
|
|
| BLAKE2b-256 |
6471cceaa24fafb2561f55c97044ad74dc491b7b25e8a41163e2b986d91bb408
|
File details
Details for the file maplib-0.20.0-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: maplib-0.20.0-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 25.8 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c765f1f4c20bd42fac1d5ec2458a6dd5898d3b4febd0029e74d4491ca864142
|
|
| MD5 |
80d3a5ddd1b84a48bfbab7ec994f2c41
|
|
| BLAKE2b-256 |
49a64e11a07708780d481456b908d392e6a1ea3aeb9b646160ef24fef59952a7
|