Skip to main content

Hybrid SPARQL query engine for timeseries data

Project description

chrontext: High-performance hybrid query engine for knowledge graphs and analytical data (e.g. time-series)

Chrontext allows you to use your knowledge graph to access large amounts of time-series or other analytical data. It uses a commodity SPARQL Triplestore and your existing data storage infrastructure. It currently supports time-series stored in a PostgreSQL-compatible Database such as DuckDB, Google Cloud BigQuery (SQL) and OPC UA HA, but can easily be extended to other APIs and databases. Chrontext Architecture

Chrontext forms a semantic layer that allows self-service data access, abstracting away technical infrastructure. Users can create query-based inputs for data products, that maintains these data products as the knowledge graph is maintained, and that can be deployed across heterogeneous on-premise and cloud infrastructures with the same API.

Chrontext is a high-performance Python library built in Rust using Polars, and relies heavily on packages from the Oxigraph project. Chrontext works with Apache Arrow, prefers time-series transport using Apache Arrow Flight and delivers results as Polars DataFrames.

Please reach out to Data Treehouse if you would like help trying Chrontext, or require support for a different database backend.

Installing

Chrontext is in pip, just use:

pip install chrontext

The API is documented HERE.

Example query in Python

The code assumes that we have a SPARQL-endpoint and BigQuery set up with time-series. The query uses a bit of syntactic sugar, but is converted to pure SPARQL before execution.

... 

df = engine.query("""
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX ct:<https://github.com/DataTreehouse/chrontext#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    PREFIX rds: <https://github.com/DataTreehouse/solar_demo/rds_power#> 
    SELECT ?inv_path WHERE {
        # We are navigating th Solar PV site "Metropolis", identifying every inverter. 
        ?site a rds:Site .
        ?site rdfs:label "Metropolis" .
        ?site rds:functionalAspect+ ?inv .    
        ?inv a rds:TBB .                    # RDS code TBB: Inverter
        ?inv rds:path ?inv_path .
        
        # Find the timeseries associated with the inverter
        ?inv ct:hasTimeseries ?ts_pow .
        ?ts_pow rdfs:label "InvPDC_kW" .    
        DT {
            timestamp = ?t,
            timeseries = ?ts_pow, 
            interval = "10m",
            from = "2018-12-25T00:00:00Z",
            aggregation = "avg" }
        }
    ORDER BY ?inv_path ?t
""")

This produces the following DataFrame:

inv_path t ts_pow_value_avg
str datetime[ns] f64
=<Metropolis>.A1.RG1.TBB1 2018-12-25 00:00:00 0.0
=<Metropolis>.A5.RG9.TBB1 2019-01-01 04:50:00 0.0

Not much power being produced at night in the middle of winter :-)

API

The API is documented HERE.

Tutorial using DuckDB

In the following tutorial, we assume that you have a couple of CSV-files on disk that you want to query. We assume that you have DuckDB and chrontext installed, if not, do pip install chrontext duckdb. Installing chrontext will also install sqlalchemy, which we rely on to define the virtualized DuckDB tables.

CSV files

Our csv files look like this.

ts1.csv :

timestamp,value
2022-06-01T08:46:52,1
2022-06-01T08:46:53,10
..
2022-06-01T08:46:59,105

ts2.csv:

timestamp,value
2022-06-01T08:46:52,2
2022-06-01T08:46:53,20
...
2022-06-01T08:46:59,206

DuckDB setup:

We need to create a class with a method query that takes a SQL string its argument, returning a Polars DataFrame. In this class, we just hard code the DuckDB setup in the constructor.

import duckdb
import polars as pl

class MyDuckDB():
    def __init__(self):
        con = duckdb.connect()
        con.execute("SET TIME ZONE 'UTC';")
        con.execute("""CREATE TABLE ts1 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_1 = pl.read_csv("ts1.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts1", df=ts_1.to_pandas())
        con.execute("""CREATE TABLE ts2 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_2 = pl.read_csv("ts2.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts2", df=ts_2.to_pandas())
        self.con = con


    def query(self, sql:str) -> pl.DataFrame:
        # We execute the query and return it as a Polars DataFrame.
        # Chrontext expects this method to exist in the provided class.
        df = self.con.execute(sql).pl()
        return df

my_db = MyDuckDB()

Defining a virtualized SQL

We first define a sqlalchemy select query involving the two tables. Chrontext will modify this query when executing hybrid queries.

from sqlalchemy import MetaData, Table, Column, bindparam
metadata = MetaData()
ts1_table = Table(
    "ts1",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts2_table = Table(
    "ts2",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts1 = ts1_table.select().add_columns(
    bindparam("id1", "ts1").label("id"),
)
ts2 = ts2_table.select().add_columns(
    bindparam("id2", "ts2").label("id"),
)
sql = ts1.union(ts2)

Now, we are ready to define the virtualized backend. We will annotate nodes of the graph with a resource data property. These data properties will be linked to virtualized RDF triples in the DuckDB backend. The resource_sql_map decides which SQL is used for each resource property.

from chrontext import VirtualizedPythonDatabase

vdb = VirtualizedPythonDatabase(
    database=my_db,
    resource_sql_map={"my_resource": sql},
    sql_dialect="postgres"
)

The triple below will link the ex:myWidget1 to triples defined by the above sql.

ex:myWidget1 ct:hasResource "my_resource" . 

However, it will only be linked to those triples corresponding to rows where the identifier column equals the identifier associated with ex:myWidget1. Below, we define that ex:instanceA is only linked to those rows where the id column is ts1.

ex:myWidget1 ct:hasIdentifier "ts1" . 

In any such resource sql, the id column is mandatory.

Relating the Database to RDF Triples

Next, we want to relate the rows in this sql, each containing id, timestamp, value to RDF triples, using a template.

from chrontext import Prefix, Variable, Template, Parameter, RDFType, Triple, XSD
ct = Prefix("ct", "https://github.com/DataTreehouse/chrontext#")
xsd = XSD()
id = Variable("id")
timestamp = Variable("timestamp")
value = Variable("value")
dp = Variable("dp")
resources = {
    "my_resource": Template(
        iri=ct.suf("my_resource"),
        parameters=[
            Parameter(id, rdf_type=RDFType.Literal(xsd.string)),
            Parameter(timestamp, rdf_type=RDFType.Literal(xsd.dateTime)),
            Parameter(value, rdf_type=RDFType.Literal(xsd.double)),
        ],
        instances=[
            Triple(id, ct.suf("hasDataPoint"), dp),
            Triple(dp, ct.suf("hasValue"), value),
            Triple(dp, ct.suf("hasTimestamp"), timestamp)
        ]
)}

This means that our instance ex:myWidget1, will be associated with a value and a timestamp (and a blank data point) for each row in ts1.csv. For instance, the first row means we have:

ex:widget1 ct:hasDataPoint _:b1 .
_:b1 ct:hasTimestamp "2022-06-01T08:46:52Z"^^xsd:dateTime .
_:b1 ct:hasValue 1 .

Chrontext is created for those cases when this is infeasibly many triples, so we do not want to materialize them, but query them.

Creating the engine and querying:

The context for our analytical data (e.g. a model of an industrial asset) has to be stored in a SPARQL endpoint. In this case, we use an embedded Oxigraph engine that comes with chrontext. Now we assemble the pieces and create the engine.

from chrontext import Engine, SparqlEmbeddedOxigraph
oxigraph_store = SparqlEmbeddedOxigraph(rdf_file="my_graph.ttl", path="oxigraph_db_tutorial")
engine = Engine(
    resources,
    virtualized_python_database=vdb,
    sparql_embedded_oxigraph=oxigraph_store)
engine.init()

Now we can use our context to query the dataset. The aggregation below are pushed into DuckDB. The example below is a bit simple, but complex conditions can identify the ?w and ?s.

q = """
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX chrontext:<https://github.com/DataTreehouse/chrontext#>
    PREFIX types:<http://example.org/types#>
    SELECT ?w (SUM(?v) as ?sum_v) WHERE {
        ?w types:hasSensor ?s .
        ?s a types:ThingCounter .
        ?s chrontext:hasTimeseries ?ts .
        ?ts chrontext:hasDataPoint ?dp .
        ?dp chrontext:hasTimestamp ?t .
        ?dp chrontext:hasValue ?v .
        FILTER(?t > "2022-06-01T08:46:53Z"^^xsd:dateTime) .
    } GROUP BY ?w
    """
df = engine.query(q)
print(df)

This produces the following result:

w sum_v
str decimal[38,0]
http://example.org/case#myWidget1 1215
http://example.org/case#myWidget2 1216

Roadmap in brief

Let us know if you have suggestions!

Stabilization

Chrontext will be put into use in the energy industry during the period, and will be stabilized as part of this process. We are very interested in your bug reports!

Support for Azure Data Explorer / KustoQL

We are likely adding support for ADX/KustoQL. Let us know if this is something that would be useful for you.

Support for Databricks SQL

We are likely adding support for Databricks SQL as the virtualization backend.

Generalization to analytical data (not just time series!)

While chrontext is currently focused on time series data, we are incrementally adding support for contextualization of arbitrary analytical data.

Support for multiple databases

Currently, we only support one database backend at a given time. We plan to support hybrid queries across multiple virtualized databases.

References

Chrontext is joint work by Magnus Bakken and Professor Ahmet Soylu at OsloMet. To read more about Chrontext, read the article Chrontext: Portable Sparql Queries Over Contextualised Time Series Data in Industrial Settings.

License

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/DataTreehouse/chrontext.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrontext-0.9.2.tar.gz (180.7 kB view details)

Uploaded Source

Built Distributions

chrontext-0.9.2-cp311-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.11 Windows x86-64

chrontext-0.9.2-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

chrontext-0.9.2-cp311-cp311-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

chrontext-0.9.2-cp311-cp311-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

chrontext-0.9.2-cp310-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.10 Windows x86-64

chrontext-0.9.2-cp310-cp310-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

chrontext-0.9.2-cp310-cp310-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

chrontext-0.9.2-cp310-cp310-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

chrontext-0.9.2-cp39-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.9 Windows x86-64

chrontext-0.9.2-cp39-cp39-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

chrontext-0.9.2-cp39-cp39-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

chrontext-0.9.2-cp39-cp39-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

chrontext-0.9.2-cp38-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.8 Windows x86-64

chrontext-0.9.2-cp38-cp38-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

chrontext-0.9.2-cp38-cp38-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

chrontext-0.9.2-cp38-cp38-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file chrontext-0.9.2.tar.gz.

File metadata

  • Download URL: chrontext-0.9.2.tar.gz
  • Upload date:
  • Size: 180.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for chrontext-0.9.2.tar.gz
Algorithm Hash digest
SHA256 c175487be6bc148d452dea70b0dafa0aac31b74a0327af12317328fa35480ace
MD5 fc4c91555ebbc5bd7128f74f78d5497b
BLAKE2b-256 c1570743dab6b20e56b8dd4d9b5ab541b9369972d2d2682c8ad26231820eb65a

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 d41177755c89e07d1e505b522886ebde24dcf5884bf1c12657693a0489a19cac
MD5 681f62726f9b356856de9f85d318b635
BLAKE2b-256 81ae625cd22ffe0e964e422c209671dff6b7250dd306989630f84fed9b0b00f8

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fab4a1947b76e564d5d5311242d92e6f83017846c2c7927f3b11d1503cc93348
MD5 eee98be1b5bdf872c31a92b6c76cbafd
BLAKE2b-256 9ffb35fe003a66ea9eab5e7bf292ae88ebbd78cbc4f70920801c57e5743e248b

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 7e0c307e0be16394345dbbadebbd4d34da774bf8d5b520113a215587c41d6085
MD5 f01f7c28a12731fb1afeaf102ed8b3ce
BLAKE2b-256 94f5e6ce5b644ddd7b72d4baecb525bb7407b3be72ef972786679e5d63a73182

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c32207f7047a09d441d41a7b7aee728ae8f5a37126f6dda5b49bf87f82be7f4f
MD5 09c710c7d3b3603ab8cf9251fbf6609f
BLAKE2b-256 8c7f41309032d27014b4398579a01438fc95256b17eb36ee7e436c7366af814c

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 98467ea6d3359677cec09964b8ebbd1b88b9a72c4b836b7a3c5788a6495ef324
MD5 37bb908a4b0faca3d9462273bf583ea8
BLAKE2b-256 f452f987fed40658e1777674a7e951261c7a1a658ebee48a4504029c1275281a

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8f137057836e2eeb6e7b9c2a2d46df7f1fd4402c5f3ef2199435588f6be48ba3
MD5 ee076b4ac46d3d411362988fc03f9f80
BLAKE2b-256 f1b5d218e457649e326661aab98ed16fc1388e63b3950f41db405908907f7113

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 c8173fbcfd5df2caed548fa7143e658cb7a0caa25cce5b6f57e2cd278062f298
MD5 205cd784c58f796e57b2044ede30a6ec
BLAKE2b-256 8c4eef93602bbcf9f22c59271a70ffb9867bb6128b03046a4cec1885b6955a92

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d70176c89943d849e520b7c6baef20f9182090384336e91d024c6cbf726b393b
MD5 424054fa3502f2eec63a3bd70af723e0
BLAKE2b-256 e168b666cb1d99d23a8b1ce67f813d7f97af200d043d0e0e0bbd6ef14e5d253d

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 5459ac6f66c9f9a6f5431ea5f1767f8e9171ff761cff06e0b3751c77485bb9f5
MD5 6f529bee89b7b7ca7e87e7ef1fcacd05
BLAKE2b-256 fbefd414f0750aa724675d308d1f17f7673c25192678eb0030ad989a1763a822

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 868fab798a05b2128fdbed6b50c67792279b78c1374752f56cbb5cae8fdebe89
MD5 1ed58d5227d2485d7fe90b9c61253f02
BLAKE2b-256 87651d479298b32a950f3317dd193035e6af1b7fa1d1f27ec1e22afc9a0defef

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 b46a6ce00662fa583d3899d9adbef616d364eda0230799aed243c42b07a124f3
MD5 674a3ca03b84a6c0fc6abe0f2dbe6751
BLAKE2b-256 86ce1774c489b0adbbc343081240d2cd0c815fba8d1b90b4eaacc1073d77d1e4

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 92c70ad55a6623e05985711b93f0f077b570f737a9ed14de2472a116ddf78be9
MD5 7163ee7cf38390a63cbfb22935db7c06
BLAKE2b-256 c64509654a5358e34574a4c532fdb15dddf49d10cfb41547c1144a87c5cb2da7

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 69d573f09aa71effd9743a26e13b900b16d9ae358b5ad72f7e1bdca30f613680
MD5 83f2dc79431974860473d721bd6ce004
BLAKE2b-256 77ee94d2402704e8d2030799667bf6252ac60b2e296770c2fca29899c939a762

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dac3ddf7113b09ce8068db0c5e8b63ad6b89c01cb980dca25ca73966c57734e5
MD5 8f6e5fc6f39bc692fb321c08e1faf851
BLAKE2b-256 d9d7777a89653ad30d09a3e86d31e1772c534f80a1a180004b1340b49e8d2ffb

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 3f3f5b9a368c4aff552ca48d0a0e2e6ac0f03f6a41b94c48c008e40abe374b1a
MD5 50d31ef8fd82e094336005a565c14811
BLAKE2b-256 0c2431d45f9ed611240061b134eb1a9457ff73c430c2975cd5c87ef270434706

See more details on using hashes here.

File details

Details for the file chrontext-0.9.2-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9ed6f30fccb5cdad489bb3d3cdfa3da734565203a312bb079412be4bcda0000b
MD5 afbda8628c0214974d927c4a4598ce6f
BLAKE2b-256 47da7c45c163efd30df40d7c27650029d580ee608d3cc6f74408b7dad61d26bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page