Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.17.tar.gz (377.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.17-cp310-abi3-win_amd64.whl (28.4 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.17-cp310-abi3-manylinux_2_28_x86_64.whl (27.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.17-cp310-abi3-manylinux_2_28_aarch64.whl (25.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.17-cp310-abi3-macosx_11_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.17-cp310-abi3-macosx_10_12_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.17.tar.gz.

File metadata

  • Download URL: maplib-0.20.17.tar.gz
  • Upload date:
  • Size: 377.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.17.tar.gz
Algorithm Hash digest
SHA256 1e491b488bf4cc0b32a6bc0781683e57cece67db03c37faeb9c49efa0ca41cfd
MD5 30e0f54a2d1bfc60e351b0559f7f2e74
BLAKE2b-256 915142250c67bdc731338ff360b6de4e58c0e98866c3626f9ce664c2e422e1ef

See more details on using hashes here.

File details

Details for the file maplib-0.20.17-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.17-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.4 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.17-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 69864ee63aee1850e2cacb45f49a632920aa26a108fa52acb7f9e6c7579be4b2
MD5 1a663488d64886a1858083c768a349e1
BLAKE2b-256 e2a07915172a8ddd98b1e6997bfe4ac229712b3be41ba2011379f8afd024ebee

See more details on using hashes here.

File details

Details for the file maplib-0.20.17-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.17-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 c85f9068203ccc30195c3b21fdff5a673324cf8676ce70c3cdea66fa7b23cf5f
MD5 9533ea2e94d553e7ff8c41b81b01fc2d
BLAKE2b-256 3ddde32e40b5f7fc2f162e5d332d15bbf76562ea3c1ce3c0c89a9462c613f789

See more details on using hashes here.

File details

Details for the file maplib-0.20.17-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.17-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2c45a368da35bd4388cc0a5e94370ee09ac148ae06739e19808769911a1b2234
MD5 5d6113785c4e7e2a2a7b2e4dc1c39daf
BLAKE2b-256 d5f187466c64e4dbd54b0091394501ae1e3d972f26371e1e5750204757ad09e4

See more details on using hashes here.

File details

Details for the file maplib-0.20.17-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.17-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a8361271b43c1acbe2595b3de566bd610ee95114ba73328372094b683c363808
MD5 0eb3847b63eb62aeb25379cc2e802dd5
BLAKE2b-256 68ad2d7f22b2de5f269a5d8cf4d0bc51300aa6854e215754b41ca4c86ac13f35

See more details on using hashes here.

File details

Details for the file maplib-0.20.17-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.17-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a0b715355993db6a1d0472a717d64eac2922985ee357c098579680c49035c529
MD5 7331d08935d6110fcdcd1cb2d7fed754
BLAKE2b-256 848a301628be52ffb14e371ec75a89e29c9378efefc75d048dc365f23f420ebe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page