Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.0.tar.gz (357.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.0-cp310-abi3-win_amd64.whl (28.6 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.0-cp310-abi3-manylinux_2_28_x86_64.whl (27.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.0-cp310-abi3-manylinux_2_28_aarch64.whl (25.2 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.0-cp310-abi3-macosx_11_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.0-cp310-abi3-macosx_10_12_x86_64.whl (25.8 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.0.tar.gz.

File metadata

  • Download URL: maplib-0.20.0.tar.gz
  • Upload date:
  • Size: 357.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.0.tar.gz
Algorithm Hash digest
SHA256 064483de5dfb5fef7a4047a5bf11c3d609d7a739e233fa54d67713ec696f9a2e
MD5 9faf52fe08d62c6528ff74895c3e6be3
BLAKE2b-256 073883019b73d452a5fcbdb797a170c7eb6a2c8917ab15328985487374182474

See more details on using hashes here.

File details

Details for the file maplib-0.20.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.6 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 294bcf01f3ea187f58c8696990fe2d52c7eac241a223612e6a387ff9dc45fff9
MD5 e5db8b1e13c9029197cd4fb887832052
BLAKE2b-256 698de8f64ffd4cffb60841e4a205610ee2894ccdfb093bcaae8b0ff7cebac45f

See more details on using hashes here.

File details

Details for the file maplib-0.20.0-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.0-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1ec62b62fb55d86d1482feec42a04ee83f829070f01e5176ef236d3720a2f237
MD5 b8df442687b3950fefd953b1df7be672
BLAKE2b-256 ec5967bf91dec8ebe1893b2bb421aafc71c4ec280b83f6d621d55f55722c9503

See more details on using hashes here.

File details

Details for the file maplib-0.20.0-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.0-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f9d70ec0a44f30b67739b6e75c97dc8066cff9aaf77ca8fe8b0efb97824cb8fe
MD5 229531248f3d21dab2f2c97f9c2bf456
BLAKE2b-256 23483e029cb30cb7a54e62f3fa42687d485c8cbc7a330e2aa27c4932216ca4a7

See more details on using hashes here.

File details

Details for the file maplib-0.20.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ebb31bbce504ecfcbf142a89160b2312d232cfb4d4b6ea04349b974c0b1b7c65
MD5 62f02c39efd173b68f130a0398cf6871
BLAKE2b-256 6471cceaa24fafb2561f55c97044ad74dc491b7b25e8a41163e2b986d91bb408

See more details on using hashes here.

File details

Details for the file maplib-0.20.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8c765f1f4c20bd42fac1d5ec2458a6dd5898d3b4febd0029e74d4491ca864142
MD5 80d3a5ddd1b84a48bfbab7ec994f2c41
BLAKE2b-256 49a64e11a07708780d481456b908d392e6a1ea3aeb9b646160ef24fef59952a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page