Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation, SPARQL and Datalog in Python

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph. You can also read knowledge graphs extremely quickly from a wide variety of serialization formats. Using the built-in SPARQL, SHACL and Datalog engines means you can query, inspect, enrich and validate and then serialize the knowledge graph immediately. All query results are Polars Dataframes that are transferred zero-copy from Rust to Python. Currently, maplib is in-memory and supports around 100M triples on 32GB of RAM.

The core functionality of maplib (mapping, querying, serialization) is open source, but SHACL and Datalog functionality are not. Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on email (magnus at data-treehouse.com) if you want to try out these features. See our roadmap for upcoming features.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • reading JSON to triples using Façade-X
  • importing triples (Turtle, RDF/XML, NTriples, JSON-LD)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model from a named graph

The API is documented HERE

Roadmap of features and optimizations

Spring 2026

  • SHACL Rules
  • Disk based storage and internal serialization format
  • Jelly
  • Graph virtualization using chrontext

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.20.14.tar.gz (376.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.20.14-cp310-abi3-win_amd64.whl (28.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.20.14-cp310-abi3-manylinux_2_28_x86_64.whl (27.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.20.14-cp310-abi3-manylinux_2_28_aarch64.whl (25.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.20.14-cp310-abi3-macosx_11_0_arm64.whl (24.1 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.20.14-cp310-abi3-macosx_10_12_x86_64.whl (26.0 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.20.14.tar.gz.

File metadata

  • Download URL: maplib-0.20.14.tar.gz
  • Upload date:
  • Size: 376.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.14.tar.gz
Algorithm Hash digest
SHA256 93822104c6ea7a400294b071c8f8ddf028f780401daf35027a70263da48066f4
MD5 fc51e7ef3b42fc4999ed372d607cb186
BLAKE2b-256 a9385358c101cedfe41054468b609d98de490cfe8a0c9c1b2aa6efdffd16af99

See more details on using hashes here.

File details

Details for the file maplib-0.20.14-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.20.14-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 28.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.20.14-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d83696da088248c800b42a911f5964aa417af6fb4be4100a751983b52d55e9a1
MD5 ef62700b4af7c80d3a23fa8593abcf16
BLAKE2b-256 5d562a0938827645bc5a8d8ed5ff49d0e215a16e9b7f46464e2894096d6f0dda

See more details on using hashes here.

File details

Details for the file maplib-0.20.14-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.14-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 49e4597b4db3fca719e6b462d5d447f1cb6b7a900a866e9da42df4fd48b63a10
MD5 fe541ff8f9c9d20f43643289a68c03a9
BLAKE2b-256 049579bf63f2e7956cabbb4b44e934eea73ecef39db321294cbcdb25981c30e4

See more details on using hashes here.

File details

Details for the file maplib-0.20.14-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.20.14-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d00b1c7ce9a316a0e4974ee669afbc65bc94ff58646079d5855d589184e15bed
MD5 0f52724b8febd8fbdf0f4761bd76f2b4
BLAKE2b-256 d669bdc08911802e585dbf3c94e766031e4c9ba5435c459c5a36479d8462e2c2

See more details on using hashes here.

File details

Details for the file maplib-0.20.14-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.20.14-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e1685cc3e1f3cbf7aa8799cf46a9c8b5dbe58d05c86d02fb821f4076d2b56dd4
MD5 4b1e2a26dafa8b3f1328d1b17d17f0c1
BLAKE2b-256 b5d14ec6666333e5caf970a85d987856f903ca26daa956543cbe648b82af70e0

See more details on using hashes here.

File details

Details for the file maplib-0.20.14-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.20.14-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cd9ddd7656a285932b858dc708cdaa9144f2599dce392b316a610fe0f2ad8b55
MD5 c1c33ef7cd6670d910893949acf66009
BLAKE2b-256 8a5ed44a9abbabd595c95cb1ae1774a68c067d9cfaabdf4071d65bce5eb7b29d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page