Skip to main content

Dataframe-based interactive knowledge graph construction

Project description

maplib: High-performance RDF knowledge graph construction, SHACL validation and SPARQL-based enrichment in Python

maplib is a knowledge graph construction library for building RDF knowledge graphs using template expansion (OTTR Templates). Maplib features SPARQL- and SHACL-engines that are available as the graph is being constructed, allowing enrichment and validation. It can construct and validate knowledge graphs with millions of nodes in seconds.

maplib allows you to leverage your existing skills with Pandas or Polars to extract and wrangle data from existing databases and spreadsheets, before applying simple templates to them to build a knowledge graph.

Template expansion is typically zero-copy and nearly instantaneous, and the built-in SPARQL and SHACL engines means you can query, inspect, enrich and validate the knowledge graph immediately.

maplib is written in Rust, it is built on Apache Arrow using Pola.rs and uses libraries from Oxigraph for handling linked data as well as parsing SPARQL queries.

Installing

The package is published on PyPi and the API documented here:

pip install maplib

Please send us a message, e.g. on LinkedIn (search for Data Treehouse) or on our webpage if you want to try out SHACL.

Model

We can easily map DataFrames to RDF-graphs using the Python library. Below is a reproduction of the example in the paper [1]. Assume that we have a DataFrame given by:

import polars as pl
pl.Config.set_fmt_str_lengths(150)

pi = "https://github.com/DataTreehouse/maplib/pizza#"
df = pl.DataFrame({
    "p":[pi + "Hawaiian", pi + "Grandiosa"],
    "c":[pi + "CAN", pi + "NOR"],
    "ings": [[pi + "Pineapple", pi + "Ham"],
             [pi + "Pepper", pi + "Meat"]]
})
print(df)

That is, our DataFrame is:

p c ings
str str list[str]
"https://.../pizza#Hawaiian" "https://.../maplib/pizza#CAN" [".../pizza#Pineapple", ".../pizza#Ham"]
"https://.../pizza#Grandiosa" "https://.../maplib/pizza#NOR" [".../pizza#Pepper", ".../pizza#Meat"]

Then we can define a OTTR template, and create our knowledge graph by expanding this template with our DataFrame as input:

from maplib import Model, Prefix, Template, Argument, Parameter, Variable, RDFType, Triple, a
pi = Prefix(pi)

p_var = Variable("p")
c_var = Variable("c")
ings_var = Variable("ings")

template = Template(
    iri= pi.suf("PizzaTemplate"),
    parameters= [
        Parameter(variable=p_var, rdf_type=RDFType.IRI()),
        Parameter(variable=c_var, rdf_type=RDFType.IRI()),
        Parameter(variable=ings_var, rdf_type=RDFType.Nested(RDFType.IRI()))
    ],
    instances= [
        Triple(p_var, a, pi.suf("Pizza")),
        Triple(p_var, pi.suf("fromCountry"), c_var),
        Triple(
            p_var, 
            pi.suf("hasIngredient"), 
            Argument(term=ings_var, list_expand=True), 
            list_expander="cross")
    ]
)

m = Model()
m.map(template, df)
hpizzas = """
    PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
    CONSTRUCT { ?p a pi:HeterodoxPizza } 
    WHERE {
        ?p a pi:Pizza .
        ?p pi:hasIngredient pi:Pineapple .
    }"""
m.insert(hpizzas)
return m

We can immediately query the mapped knowledge graph:

m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
SELECT ?p ?i WHERE {
?p a pi:Pizza .
?p pi:hasIngredient ?i .
}
""")

The query gives the following result (a DataFrame):

p i
str str
"https://.../pizza#Grandiosa" "https://.../pizza#Meat"
"https://.../pizza#Grandiosa" "https://.../pizza#Pepper"
"https://.../pizza#Hawaiian" "https://.../pizza#Pineapple"
"https://.../pizza#Hawaiian" "https://.../pizza#Ham"

Next, we are able to perform a construct query, which creates new triples but does not insert them.

hpizzas = """
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>
CONSTRUCT { ?p a pi:UnorthodoxPizza } 
WHERE {
    ?p a pi:Pizza .
    ?p pi:hasIngredient pi:Pineapple .
}"""
res = m.query(hpizzas)
res[0]

The resulting triples are given below:

subject verb object
str str str
"https://.../pizza#Hawaiian" "http://.../22-rdf-syntax-ns#type" "https://.../pizza#UnorthodoxPizza"

If we are happy with the output of this construct-query, we can insert it in the model state. Afterwards we check that the triple is added with a query.

m.insert(hpizzas)
m.query("""
PREFIX pi:<https://github.com/DataTreehouse/maplib/pizza#>

SELECT ?p WHERE {
?p a pi:UnorthodoxPizza
}
""")

Indeed, we have added the triple:

p
str
"https://github.com/DataTreehouse/maplib/pizza#Hawaiian"

API

The API is simple, and contains only one class and a few methods for:

  • expanding templates
  • querying with SPARQL
  • validating with SHACL
  • importing triples (Turtle, RDF/XML, NTriples)
  • writing triples (Turtle, RDF/XML, NTriples)
  • creating a new Model object (sprout) based on queries over the current Model object.

The API is documented HERE

Roadmap of features and optimizations

Spring 2025

  • Datalog reasoning support ✅
  • Reduced memory footprint ✅
  • Further SPARQL optimizations
  • JSON-LD support

Fall 2025

  • SHACL rules support
  • Improved TTL serialization (prettier and faster) +++

Roadmap is subject to changes,particularly user and customer requests.

References

There is an associated paper [1] with associated benchmarks showing superior performance and scalability that can be found here. OTTR is described in [2].

[1] M. Bakken, "maplib: Interactive, literal RDF model model for industry," in IEEE Access, doi: 10.1109/ACCESS.2023.3269093.

[2] M. G. Skjæveland, D. P. Lupp, L. H. Karlsen, and J. W. Klüwer, “Ottr: Formal templates for pattern-based ontology engineering.” in WOP (Book), 2021, pp. 349–377.

Licensing

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/magbak/maplib.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maplib-0.19.14.tar.gz (338.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

maplib-0.19.14-cp310-abi3-win_amd64.whl (26.9 MB view details)

Uploaded CPython 3.10+Windows x86-64

maplib-0.19.14-cp310-abi3-manylinux_2_28_x86_64.whl (25.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

maplib-0.19.14-cp310-abi3-manylinux_2_28_aarch64.whl (23.9 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

maplib-0.19.14-cp310-abi3-macosx_11_0_arm64.whl (22.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

maplib-0.19.14-cp310-abi3-macosx_10_12_x86_64.whl (24.4 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file maplib-0.19.14.tar.gz.

File metadata

  • Download URL: maplib-0.19.14.tar.gz
  • Upload date:
  • Size: 338.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.19.14.tar.gz
Algorithm Hash digest
SHA256 3b656a1c558ae938b01576bf911f4ea4970b554ca25b6a970fdf7e0ebbf5780a
MD5 ea533c56d7a8c77df6bf7f135ed47a1d
BLAKE2b-256 e07d4d0635fbcfe13079b0553dfd57e19c1e8e7c9b53e9c94cff4502e1e6e4dc

See more details on using hashes here.

File details

Details for the file maplib-0.19.14-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: maplib-0.19.14-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 26.9 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.9.4

File hashes

Hashes for maplib-0.19.14-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 835de528ccb1194256d97307b89d3ce2696e4ca54cc1020961466068efe90428
MD5 a0d2109a64399059f2192ba9cc987eda
BLAKE2b-256 dfe23972acd96b672270ac81d169fe6c93778a7d1254c157a1967d7f50b82fa4

See more details on using hashes here.

File details

Details for the file maplib-0.19.14-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.19.14-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2e71e6482288589f85f5dadf0299ee65944065ab6021859af36227b356973576
MD5 35159b07c9704212f29a3f5ef213991e
BLAKE2b-256 8c1f445de5e0276dcf93e8597dcb32e4655ff17481f4fbf2b678bacc0b8a5290

See more details on using hashes here.

File details

Details for the file maplib-0.19.14-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for maplib-0.19.14-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 627011af615b316cd040ef2566d6064de7cad485f80dcfd1749cb6e8c09745fb
MD5 560219200059db238e107c5bbfcf870e
BLAKE2b-256 41ea0ac4d5f370b3ab91a90a798f1e85aa4389af5e3719384d266c0915b6152f

See more details on using hashes here.

File details

Details for the file maplib-0.19.14-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for maplib-0.19.14-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 23ed3d0cff4cabb1596ba831e83908089038e424297c9520013e30b748b2c327
MD5 ebdce97286406b086d1774d2d916ddaf
BLAKE2b-256 db8adf45d8c9197e91c501a35c51cae5bdd1e8a068a530b49ff81e2f5cf5cce0

See more details on using hashes here.

File details

Details for the file maplib-0.19.14-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for maplib-0.19.14-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ea379be8fa7bc86cfa33fa52ededfd8b269d23aaf6be60a03934015d5a2a3418
MD5 17ed351ac1373c7f05e378027168b4df
BLAKE2b-256 b704ccf9764572bef3dc659c298267c4d0f6c3b9ffad072c30c6bce526192a8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page