Skip to main content

Hybrid SPARQL query engine for timeseries data

Project description

chrontext: High-performance hybrid query engine for knowledge graphs and analytical data (e.g. time-series)

Chrontext allows you to use your knowledge graph to access large amounts of time-series or other analytical data. It uses a commodity SPARQL Triplestore and your existing data storage infrastructure. It currently supports time-series stored in a PostgreSQL-compatible Database such as DuckDB, Google Cloud BigQuery (SQL) and OPC UA HA, but can easily be extended to other APIs and databases. Chrontext Architecture

Chrontext forms a semantic layer that allows self-service data access, abstracting away technical infrastructure. Users can create query-based inputs for data products, that maintains these data products as the knowledge graph is maintained, and that can be deployed across heterogeneous on-premise and cloud infrastructures with the same API.

Chrontext is a high-performance Python library built in Rust using Polars, and relies heavily on packages from the Oxigraph project. Chrontext works with Apache Arrow, prefers time-series transport using Apache Arrow Flight and delivers results as Polars DataFrames.

Please reach out to Data Treehouse if you would like help trying Chrontext, or require support for a different database backend.

Installing

Chrontext is in pip, just use:

pip install chrontext

The API is documented HERE.

Example query in Python

The code assumes that we have a SPARQL-endpoint and BigQuery set up with time-series. The query uses a bit of syntactic sugar, but is converted to pure SPARQL before execution.

... 

df = engine.query("""
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX ct:<https://github.com/DataTreehouse/chrontext#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    PREFIX rds: <https://github.com/DataTreehouse/solar_demo/rds_power#> 
    SELECT ?inv_path WHERE {
        # We are navigating th Solar PV site "Metropolis", identifying every inverter. 
        ?site a rds:Site .
        ?site rdfs:label "Metropolis" .
        ?site rds:functionalAspect+ ?inv .    
        ?inv a rds:TBB .                    # RDS code TBB: Inverter
        ?inv rds:path ?inv_path .
        
        # Find the timeseries associated with the inverter
        ?inv ct:hasTimeseries ?ts_pow .
        ?ts_pow rdfs:label "InvPDC_kW" .    
        DT {
            timestamp = ?t,
            timeseries = ?ts_pow, 
            interval = "10m",
            from = "2018-12-25T00:00:00Z",
            aggregation = "avg" }
        }
    ORDER BY ?inv_path ?t
""")

This produces the following DataFrame:

inv_path t ts_pow_value_avg
str datetime[ns] f64
=<Metropolis>.A1.RG1.TBB1 2018-12-25 00:00:00 0.0
=<Metropolis>.A5.RG9.TBB1 2019-01-01 04:50:00 0.0

Not much power being produced at night in the middle of winter :-)

API

The API is documented HERE.

Tutorial using DuckDB

In the following tutorial, we assume that you have a couple of CSV-files on disk that you want to query. We assume that you have DuckDB and chrontext installed, if not, do pip install chrontext duckdb. Installing chrontext will also install sqlalchemy, which we rely on to define the virtualized DuckDB tables.

CSV files

Our csv files look like this.

ts1.csv :

timestamp,value
2022-06-01T08:46:52,1
2022-06-01T08:46:53,10
..
2022-06-01T08:46:59,105

ts2.csv:

timestamp,value
2022-06-01T08:46:52,2
2022-06-01T08:46:53,20
...
2022-06-01T08:46:59,206

DuckDB setup:

We need to create a class with a method query that takes a SQL string its argument, returning a Polars DataFrame. In this class, we just hard code the DuckDB setup in the constructor.

import duckdb
import polars as pl

class MyDuckDB():
    def __init__(self):
        con = duckdb.connect()
        con.execute("SET TIME ZONE 'UTC';")
        con.execute("""CREATE TABLE ts1 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_1 = pl.read_csv("ts1.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts1", df=ts_1.to_pandas())
        con.execute("""CREATE TABLE ts2 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_2 = pl.read_csv("ts2.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts2", df=ts_2.to_pandas())
        self.con = con


    def query(self, sql:str) -> pl.DataFrame:
        # We execute the query and return it as a Polars DataFrame.
        # Chrontext expects this method to exist in the provided class.
        df = self.con.execute(sql).pl()
        return df

my_db = MyDuckDB()

Defining a virtualized SQL

We first define a sqlalchemy select query involving the two tables. Chrontext will modify this query when executing hybrid queries.

from sqlalchemy import MetaData, Table, Column, bindparam
metadata = MetaData()
ts1_table = Table(
    "ts1",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts2_table = Table(
    "ts2",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts1 = ts1_table.select().add_columns(
    bindparam("id1", "ts1").label("id"),
)
ts2 = ts2_table.select().add_columns(
    bindparam("id2", "ts2").label("id"),
)
sql = ts1.union(ts2)

Now, we are ready to define the virtualized backend. We will annotate nodes of the graph with a resource data property. These data properties will be linked to virtualized RDF triples in the DuckDB backend. The resource_sql_map decides which SQL is used for each resource property.

from chrontext import VirtualizedPythonDatabase

vdb = VirtualizedPythonDatabase(
    database=my_db,
    resource_sql_map={"my_resource": sql},
    sql_dialect="postgres"
)

The triple below will link the ex:myWidget1 to triples defined by the above sql.

ex:myWidget1 ct:hasResource "my_resource" . 

However, it will only be linked to those triples corresponding to rows where the identifier column equals the identifier associated with ex:myWidget1. Below, we define that ex:instanceA is only linked to those rows where the id column is ts1.

ex:myWidget1 ct:hasIdentifier "ts1" . 

In any such resource sql, the id column is mandatory.

Relating the Database to RDF Triples

Next, we want to relate the rows in this sql, each containing id, timestamp, value to RDF triples, using a template.

from chrontext import Prefix, Variable, Template, Parameter, RDFType, Triple, XSD
ct = Prefix("ct", "https://github.com/DataTreehouse/chrontext#")
xsd = XSD()
id = Variable("id")
timestamp = Variable("timestamp")
value = Variable("value")
dp = Variable("dp")
resources = {
    "my_resource": Template(
        iri=ct.suf("my_resource"),
        parameters=[
            Parameter(id, rdf_type=RDFType.Literal(xsd.string)),
            Parameter(timestamp, rdf_type=RDFType.Literal(xsd.dateTime)),
            Parameter(value, rdf_type=RDFType.Literal(xsd.double)),
        ],
        instances=[
            Triple(id, ct.suf("hasDataPoint"), dp),
            Triple(dp, ct.suf("hasValue"), value),
            Triple(dp, ct.suf("hasTimestamp"), timestamp)
        ]
)}

This means that our instance ex:myWidget1, will be associated with a value and a timestamp (and a blank data point) for each row in ts1.csv. For instance, the first row means we have:

ex:widget1 ct:hasDataPoint _:b1 .
_:b1 ct:hasTimestamp "2022-06-01T08:46:52Z"^^xsd:dateTime .
_:b1 ct:hasValue 1 .

Chrontext is created for those cases when this is infeasibly many triples, so we do not want to materialize them, but query them.

Creating the engine and querying:

The context for our analytical data (e.g. a model of an industrial asset) has to be stored in a SPARQL endpoint. In this case, we use an embedded Oxigraph engine that comes with chrontext. Now we assemble the pieces and create the engine.

from chrontext import Engine, SparqlEmbeddedOxigraph
oxigraph_store = SparqlEmbeddedOxigraph(rdf_file="my_graph.ttl", path="oxigraph_db_tutorial")
engine = Engine(
    resources,
    virtualized_python_database=vdb,
    sparql_embedded_oxigraph=oxigraph_store)
engine.init()

Now we can use our context to query the dataset. The aggregation below are pushed into DuckDB. The example below is a bit simple, but complex conditions can identify the ?w and ?s.

q = """
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX chrontext:<https://github.com/DataTreehouse/chrontext#>
    PREFIX types:<http://example.org/types#>
    SELECT ?w (SUM(?v) as ?sum_v) WHERE {
        ?w types:hasSensor ?s .
        ?s a types:ThingCounter .
        ?s chrontext:hasTimeseries ?ts .
        ?ts chrontext:hasDataPoint ?dp .
        ?dp chrontext:hasTimestamp ?t .
        ?dp chrontext:hasValue ?v .
        FILTER(?t > "2022-06-01T08:46:53Z"^^xsd:dateTime) .
    } GROUP BY ?w
    """
df = engine.query(q)
print(df)

This produces the following result:

w sum_v
str decimal[38,0]
http://example.org/case#myWidget1 1215
http://example.org/case#myWidget2 1216

Roadmap in brief

Let us know if you have suggestions!

Stabilization

Chrontext will be put into use in the energy industry during the period, and will be stabilized as part of this process. We are very interested in your bug reports!

Support for Azure Data Explorer / KustoQL

We are likely adding support for ADX/KustoQL. Let us know if this is something that would be useful for you.

Support for Databricks SQL

We are likely adding support for Databricks SQL as the virtualization backend.

Generalization to analytical data (not just time series!)

While chrontext is currently focused on time series data, we are incrementally adding support for contextualization of arbitrary analytical data.

Support for multiple databases

Currently, we only support one database backend at a given time. We plan to support hybrid queries across multiple virtualized databases.

References

Chrontext is joint work by Magnus Bakken and Professor Ahmet Soylu at OsloMet. To read more about Chrontext, read the article Chrontext: Portable Sparql Queries Over Contextualised Time Series Data in Industrial Settings.

License

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/DataTreehouse/chrontext.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrontext-0.9.4.tar.gz (182.4 kB view details)

Uploaded Source

Built Distributions

chrontext-0.9.4-cp311-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.11 Windows x86-64

chrontext-0.9.4-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

chrontext-0.9.4-cp311-cp311-macosx_12_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

chrontext-0.9.4-cp311-cp311-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

chrontext-0.9.4-cp310-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.10 Windows x86-64

chrontext-0.9.4-cp310-cp310-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

chrontext-0.9.4-cp310-cp310-macosx_12_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

chrontext-0.9.4-cp310-cp310-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

chrontext-0.9.4-cp39-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.9 Windows x86-64

chrontext-0.9.4-cp39-cp39-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

chrontext-0.9.4-cp39-cp39-macosx_12_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

chrontext-0.9.4-cp39-cp39-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

chrontext-0.9.4-cp38-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.8 Windows x86-64

chrontext-0.9.4-cp38-cp38-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

chrontext-0.9.4-cp38-cp38-macosx_12_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

chrontext-0.9.4-cp38-cp38-macosx_11_0_arm64.whl (22.6 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file chrontext-0.9.4.tar.gz.

File metadata

  • Download URL: chrontext-0.9.4.tar.gz
  • Upload date:
  • Size: 182.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.0

File hashes

Hashes for chrontext-0.9.4.tar.gz
Algorithm Hash digest
SHA256 930ee476e7dc0dfa2f0077bd77f42814eb1040d70ab0c232bf9a92f0816b0699
MD5 72c922d68fd2f9440089a023c8799cd8
BLAKE2b-256 9d5b70d9e6e24906a711392cccf4c2f65a6eafa77c5aa992bbfe36aa4541cb41

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 fbc938504a36e7762d8423478ac63eaf40b138b94dd748893324bf27ff0aa62c
MD5 c42a131c4138aaf1121a3105928096cd
BLAKE2b-256 5c748ff40b34cf257b441d224839e7a6222bcdce1e587a1842ac02bebd617d91

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2b92abd740c2cadaf66115aaed2010ca041da60be67473d77eeefba5ea32df09
MD5 406f0f5176f17176912aea51083794e2
BLAKE2b-256 793a5e4c98aa78bd374cad294c44438a6866454843eebfc032ca352f31b3879b

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 dfe530e16a943429b2dbdee88e25562563a070596d5c7e87005f96a350c2397a
MD5 2851c9bef890ad371b73c564c663efee
BLAKE2b-256 d7b40d8ff1b5bf26dd62c69b5e89050fb8d23ef436af26625d6d7bcf91d97c72

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f1b31ebade79b5cbc958abfaa5868c77ebb080797b545d4be25fb3d3dac2ca31
MD5 1cf9898a5327b2fc6c520535447b9eb8
BLAKE2b-256 e8f19d97cc14fe0831d6156535562e7a6a9aee0f88b6469a69bc6bd52152cfc6

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 9e80338aef1d694c8139185f881b7bdbc9daa7bdd4aaae9436cf64b55564cae9
MD5 1b252806bd0175593d58fe80220e97a8
BLAKE2b-256 5a05ad775cab59e2f16b3d389c37e1d8c0007333158624fff3aa7719be2043c8

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dd0670edbd96465b17faf15a05f82f22f4be053a83ec8c28dad53e937a97c62d
MD5 fc5342a81edd8289a49a9d63e5c0b34c
BLAKE2b-256 76d7340cae643a82144c0ee6f76a9e97c1e78b01816a8b07c2f7be8f84c42d5b

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 14cfe20540a765a37c2cef3487183730f043721945c9df89823c49964928cd5b
MD5 2eb31d4dc02a700c21f1cb7d081b1224
BLAKE2b-256 65432b8b12317733c115a0a46bd2ecb8cc8cae1ba3b11fdd567dfcb85b86d750

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9fad88be516ba364c5776f1ce4559a3c90ccc9c0f15268fbd52cc2ee983709c8
MD5 b66f7ab02a9f562bb41f4bc4c186334d
BLAKE2b-256 d87e2af1acca8f24715874107cfd2e4f522ca8cb352d95fd1c8e39b682eb71b4

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 61acafb33581b2dade13645d11de5fea482e2e6a710d895cf063ebefd2742b76
MD5 37f970a6a555755861bb2bb4d6e34ef6
BLAKE2b-256 44eea534a7a92b72531479c8ad61e8caf55b2ffa239d9fe17ceb5dddb6ac5212

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e52235b5ae6c18057be34a966c2a408111fed4dbc63656714d1ec5d1d00afe5c
MD5 7386d869d363666e6b17807fbe9d19d7
BLAKE2b-256 c6a33434e78aca7dcc6e038752d7903558ddc41557f518b560ee8dcc83ed5ae4

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 ddc755de5b2c64ba4bb69f326ebaad70fde69ebd95452c22cccab620b542f31e
MD5 c6c27068b258f525baa7cfee2a97303b
BLAKE2b-256 bada5254391f3869fcb682f5c796f7622851822847a23aaab964f24bf33bf580

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5267c9becd7e7b96efd35750cdc5f097ce31956b2fdc7071c9f2419e180519c
MD5 2e44883db9864fb66a46c6a505e3049d
BLAKE2b-256 05c557a85c7a228bc815796027f9154b099779ed8877f1ed77a54425e9dc2e2c

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 4d09b48f0a7b71f58641c978ab85fbf81ebca85928f71fdea0933342970102f2
MD5 8dd0462059df7c5b1a0f19ed4f40bd7e
BLAKE2b-256 cfb2eae825c13710a27f132784ba49796e56bb6ab49543e79b9aab920fb8f209

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1601ca0e36484a000998b63d2b26c2341f6f9ba36bfc1a58f9f2b1821c69b307
MD5 8c6cfe4d4f1ebf188f8b60261500c3e3
BLAKE2b-256 d3a379abebe75ef817ffc6d00424e3c5a3ce6bb0049b1e9636a2d08d20fa003c

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 944fc12543cf56d4fb26c2299c3a7e6284d6aa74678722c2e4fb20b8d922f3e9
MD5 0687d9c73b1155a6098e8ebd72d56993
BLAKE2b-256 d17eea77d146719d61b5ed8ee42f642ffc96131d5f350b5ff7155c640fc7f26d

See more details on using hashes here.

File details

Details for the file chrontext-0.9.4-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.4-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 507375c49cadfffe1f3717c7cd1042ce0472beef5bc6af44cd0ba2857cf77f6a
MD5 1392847cb070a3b5ee2cebe2fcfd7be7
BLAKE2b-256 aec617850b8a7600fcacdb5716987c09b8cd3b5c9a5a3fe4d1d349efda861971

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page