Skip to main content

Hybrid SPARQL query engine for timeseries data

Project description

chrontext: High-performance hybrid query engine for knowledge graphs and analytical data (e.g. time-series)

Chrontext allows you to use your knowledge graph to access large amounts of time-series or other analytical data. It uses a commodity SPARQL Triplestore and your existing data storage infrastructure. It currently supports time-series stored in a PostgreSQL-compatible Database such as DuckDB, Google Cloud BigQuery (SQL) and OPC UA HA, but can easily be extended to other APIs and databases. Chrontext Architecture

Chrontext forms a semantic layer that allows self-service data access, abstracting away technical infrastructure. Users can create query-based inputs for data products, that maintains these data products as the knowledge graph is maintained, and that can be deployed across heterogeneous on-premise and cloud infrastructures with the same API.

Chrontext is a high-performance Python library built in Rust using Polars, and relies heavily on packages from the Oxigraph project. Chrontext works with Apache Arrow, prefers time-series transport using Apache Arrow Flight and delivers results as Polars DataFrames.

Please reach out to Data Treehouse if you would like help trying Chrontext, or require support for a different database backend.

Installing

Chrontext is in pip, just use:

pip install chrontext

The API is documented HERE.

Example query in Python

The code assumes that we have a SPARQL-endpoint and BigQuery set up with time-series. The query uses a bit of syntactic sugar, but is converted to pure SPARQL before execution.

... 

df = engine.query("""
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX ct:<https://github.com/DataTreehouse/chrontext#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    PREFIX rds: <https://github.com/DataTreehouse/solar_demo/rds_power#> 
    SELECT ?inv_path WHERE {
        # We are navigating th Solar PV site "Metropolis", identifying every inverter. 
        ?site a rds:Site .
        ?site rdfs:label "Metropolis" .
        ?site rds:functionalAspect+ ?inv .    
        ?inv a rds:TBB .                    # RDS code TBB: Inverter
        ?inv rds:path ?inv_path .
        
        # Find the timeseries associated with the inverter
        ?inv ct:hasTimeseries ?ts_pow .
        ?ts_pow rdfs:label "InvPDC_kW" .    
        DT {
            timestamp = ?t,
            timeseries = ?ts_pow, 
            interval = "10m",
            from = "2018-12-25T00:00:00Z",
            aggregation = "avg" }
        }
    ORDER BY ?inv_path ?t
""")

This produces the following DataFrame:

inv_path t ts_pow_value_avg
str datetime[ns] f64
=<Metropolis>.A1.RG1.TBB1 2018-12-25 00:00:00 0.0
=<Metropolis>.A5.RG9.TBB1 2019-01-01 04:50:00 0.0

Not much power being produced at night in the middle of winter :-)

API

The API is documented HERE.

Tutorial using DuckDB

In the following tutorial, we assume that you have a couple of CSV-files on disk that you want to query. We assume that you have DuckDB and chrontext installed, if not, do pip install chrontext duckdb. Installing chrontext will also install sqlalchemy, which we rely on to define the virtualized DuckDB tables.

CSV files

Our csv files look like this.

ts1.csv :

timestamp,value
2022-06-01T08:46:52,1
2022-06-01T08:46:53,10
..
2022-06-01T08:46:59,105

ts2.csv:

timestamp,value
2022-06-01T08:46:52,2
2022-06-01T08:46:53,20
...
2022-06-01T08:46:59,206

DuckDB setup:

We need to create a class with a method query that takes a SQL string its argument, returning a Polars DataFrame. In this class, we just hard code the DuckDB setup in the constructor.

import duckdb
import polars as pl

class MyDuckDB():
    def __init__(self):
        con = duckdb.connect()
        con.execute("SET TIME ZONE 'UTC';")
        con.execute("""CREATE TABLE ts1 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_1 = pl.read_csv("ts1.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts1", df=ts_1.to_pandas())
        con.execute("""CREATE TABLE ts2 ("timestamp" TIMESTAMPTZ, "value" INTEGER)""")
        ts_2 = pl.read_csv("ts2.csv", try_parse_dates=True).with_columns(pl.col("timestamp").dt.replace_time_zone("UTC"))
        con.append("ts2", df=ts_2.to_pandas())
        self.con = con


    def query(self, sql:str) -> pl.DataFrame:
        # We execute the query and return it as a Polars DataFrame.
        # Chrontext expects this method to exist in the provided class.
        df = self.con.execute(sql).pl()
        return df

my_db = MyDuckDB()

Defining a virtualized SQL

We first define a sqlalchemy select query involving the two tables. Chrontext will modify this query when executing hybrid queries.

from sqlalchemy import MetaData, Table, Column, bindparam
metadata = MetaData()
ts1_table = Table(
    "ts1",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts2_table = Table(
    "ts2",
    metadata,
    Column("timestamp"),
    Column("value")
)
ts1 = ts1_table.select().add_columns(
    bindparam("id1", "ts1").label("id"),
)
ts2 = ts2_table.select().add_columns(
    bindparam("id2", "ts2").label("id"),
)
sql = ts1.union(ts2)

Now, we are ready to define the virtualized backend. We will annotate nodes of the graph with a resource data property. These data properties will be linked to virtualized RDF triples in the DuckDB backend. The resource_sql_map decides which SQL is used for each resource property.

from chrontext import VirtualizedPythonDatabase

vdb = VirtualizedPythonDatabase(
    database=my_db,
    resource_sql_map={"my_resource": sql},
    sql_dialect="postgres"
)

The triple below will link the ex:myWidget1 to triples defined by the above sql.

ex:myWidget1 ct:hasResource "my_resource" . 

However, it will only be linked to those triples corresponding to rows where the identifier column equals the identifier associated with ex:myWidget1. Below, we define that ex:instanceA is only linked to those rows where the id column is ts1.

ex:myWidget1 ct:hasIdentifier "ts1" . 

In any such resource sql, the id column is mandatory.

Relating the Database to RDF Triples

Next, we want to relate the rows in this sql, each containing id, timestamp, value to RDF triples, using a template.

from chrontext import Prefix, Variable, Template, Parameter, RDFType, Triple, XSD
ct = Prefix("ct", "https://github.com/DataTreehouse/chrontext#")
xsd = XSD()
id = Variable("id")
timestamp = Variable("timestamp")
value = Variable("value")
dp = Variable("dp")
resources = {
    "my_resource": Template(
        iri=ct.suf("my_resource"),
        parameters=[
            Parameter(id, rdf_type=RDFType.Literal(xsd.string)),
            Parameter(timestamp, rdf_type=RDFType.Literal(xsd.dateTime)),
            Parameter(value, rdf_type=RDFType.Literal(xsd.double)),
        ],
        instances=[
            Triple(id, ct.suf("hasDataPoint"), dp),
            Triple(dp, ct.suf("hasValue"), value),
            Triple(dp, ct.suf("hasTimestamp"), timestamp)
        ]
)}

This means that our instance ex:myWidget1, will be associated with a value and a timestamp (and a blank data point) for each row in ts1.csv. For instance, the first row means we have:

ex:widget1 ct:hasDataPoint _:b1 .
_:b1 ct:hasTimestamp "2022-06-01T08:46:52Z"^^xsd:dateTime .
_:b1 ct:hasValue 1 .

Chrontext is created for those cases when this is infeasibly many triples, so we do not want to materialize them, but query them.

Creating the engine and querying:

The context for our analytical data (e.g. a model of an industrial asset) has to be stored in a SPARQL endpoint. In this case, we use an embedded Oxigraph engine that comes with chrontext. Now we assemble the pieces and create the engine.

from chrontext import Engine, SparqlEmbeddedOxigraph
oxigraph_store = SparqlEmbeddedOxigraph(rdf_file="my_graph.ttl", path="oxigraph_db_tutorial")
engine = Engine(
    resources,
    virtualized_python_database=vdb,
    sparql_embedded_oxigraph=oxigraph_store)
engine.init()

Now we can use our context to query the dataset. The aggregation below are pushed into DuckDB. The example below is a bit simple, but complex conditions can identify the ?w and ?s.

q = """
    PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
    PREFIX chrontext:<https://github.com/DataTreehouse/chrontext#>
    PREFIX types:<http://example.org/types#>
    SELECT ?w (SUM(?v) as ?sum_v) WHERE {
        ?w types:hasSensor ?s .
        ?s a types:ThingCounter .
        ?s chrontext:hasTimeseries ?ts .
        ?ts chrontext:hasDataPoint ?dp .
        ?dp chrontext:hasTimestamp ?t .
        ?dp chrontext:hasValue ?v .
        FILTER(?t > "2022-06-01T08:46:53Z"^^xsd:dateTime) .
    } GROUP BY ?w
    """
df = engine.query(q)
print(df)

This produces the following result:

w sum_v
str decimal[38,0]
http://example.org/case#myWidget1 1215
http://example.org/case#myWidget2 1216

Roadmap in brief

Let us know if you have suggestions!

Stabilization

Chrontext will be put into use in the energy industry during the period, and will be stabilized as part of this process. We are very interested in your bug reports!

Support for Azure Data Explorer / KustoQL

We are likely adding support for ADX/KustoQL. Let us know if this is something that would be useful for you.

Support for Databricks SQL

We are likely adding support for Databricks SQL as the virtualization backend.

Generalization to analytical data (not just time series!)

While chrontext is currently focused on time series data, we are incrementally adding support for contextualization of arbitrary analytical data.

Support for multiple databases

Currently, we only support one database backend at a given time. We plan to support hybrid queries across multiple virtualized databases.

References

Chrontext is joint work by Magnus Bakken and Professor Ahmet Soylu at OsloMet. To read more about Chrontext, read the article Chrontext: Portable Sparql Queries Over Contextualised Time Series Data in Industrial Settings.

License

All code produced since August 1st. 2023 is copyrighted to Data Treehouse AS with an Apache 2.0 license unless otherwise noted.

All code which was produced before August 1st. 2023 copyrighted to Prediktor AS with an Apache 2.0 license unless otherwise noted, and has been financed by The Research Council of Norway (grant no. 316656) and Prediktor AS as part of a PhD Degree. The code at this state is archived in the repository at https://github.com/DataTreehouse/chrontext.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrontext-0.9.0.tar.gz (176.4 kB view details)

Uploaded Source

Built Distributions

chrontext-0.9.0-cp311-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.11 Windows x86-64

chrontext-0.9.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

chrontext-0.9.0-cp311-cp311-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

chrontext-0.9.0-cp311-cp311-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

chrontext-0.9.0-cp310-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.10 Windows x86-64

chrontext-0.9.0-cp310-cp310-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

chrontext-0.9.0-cp310-cp310-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.10 macOS 12.0+ ARM64

chrontext-0.9.0-cp310-cp310-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

chrontext-0.9.0-cp39-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.9 Windows x86-64

chrontext-0.9.0-cp39-cp39-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.28+ x86-64

chrontext-0.9.0-cp39-cp39-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.9 macOS 12.0+ ARM64

chrontext-0.9.0-cp39-cp39-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

chrontext-0.9.0-cp38-none-win_amd64.whl (25.6 MB view details)

Uploaded CPython 3.8 Windows x86-64

chrontext-0.9.0-cp38-cp38-manylinux_2_28_x86_64.whl (30.7 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.28+ x86-64

chrontext-0.9.0-cp38-cp38-macosx_12_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.8 macOS 12.0+ ARM64

chrontext-0.9.0-cp38-cp38-macosx_11_0_arm64.whl (22.5 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file chrontext-0.9.0.tar.gz.

File metadata

  • Download URL: chrontext-0.9.0.tar.gz
  • Upload date:
  • Size: 176.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.5.1

File hashes

Hashes for chrontext-0.9.0.tar.gz
Algorithm Hash digest
SHA256 53efda40bf74825dda54b3e26f3ff265b573101be2be716c69017aa0a2687afe
MD5 4b0c5945cc7f03fea8879f5d63c98fdc
BLAKE2b-256 51de54cc53a619b4899054f9607d145a38a1fe20945110609052760efde6c059

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 417b435c741a3605a75e4f6cad681a13c2cd84d85cef00a0724fc40d11027f47
MD5 e19dd1c9ee251d95b544794983f73aac
BLAKE2b-256 eaacbdd808a5cb2b3927606cc5cc2aa85b5689dcc8e1b715568b4cb4fb0b62f2

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d65b47871d1ad9714054fca63b02f12c3122b0e35f7baf9ff74533686f52daf4
MD5 4d644cb45fc1b1069ee7b99d33baa8a4
BLAKE2b-256 3a188997ed3c743528844a8dfe616ae5b32fbd0bfc54f170fb33309368075db8

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 7de7ab0bb77d03f7bc2cc6d1d0b863c4786884d2495e1ee6b74ba360be6e9f79
MD5 3be3d054b57664708fe8f58685f36363
BLAKE2b-256 f25794fe34e5afe024db9b7f4b4331f40fb47a756bcd8a2726e9f46873f107f4

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 526c47e90a52d4299831e7a3dae777cd2d0e50e606f4e854e8e2e32bfcfa8699
MD5 6c073255039016cd9890085923b55b22
BLAKE2b-256 f61850abbd784a45d07cf182f2c2a4c397f6d8341c74c9b4f49c9b467a3b00f7

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 1a8ecdb32ad32c991efc52210a589c80297bc3b08d2e44d68d038622f7cf28ee
MD5 6eb46419c7e8571b5d2e947655324389
BLAKE2b-256 87982477ae9458b3795e5d1ad218ac724b2c89e74042a1994438e6488ebb69aa

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8aa88f13c9b17ab8501e4860ebe15bee9e959ce6878de693375b4312db102865
MD5 21324c729c10e30e269654db836a0aa6
BLAKE2b-256 7409541ca388cb58d03688f3472562ab44579cf9ba9cedfdd807acf70933998c

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp310-cp310-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 5de2ad6bc0aae03e4668fd81d5c24c3997b093b3a9b51d11185b83f0fe243789
MD5 b52d2a52c9e36686e25d292f41fa7f17
BLAKE2b-256 7afa14e132c073dfb3c2527392abb6ea9b21bfbe3584e535af2cf733ae859b49

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8f22bab7af8b051a8be2191fea39a43487547830e124251189cbeb8afe362668
MD5 d4944ca0c0332f1880f076620e9819e9
BLAKE2b-256 96b1ece85bba929a680d8d662dbf443907b98cc863f2adacc2206a4799421d06

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 980df0b8e838b0e20dc20a3906b6c0bd4806ef54b377702a0eaff3d1bc7fecd0
MD5 5cea5f8b6181b13dc92d9b9c2051e199
BLAKE2b-256 a09132cc5eb54a3649dfd3bdd61f1e8208fa7fb9b638402f08f65237d9b5ed35

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp39-cp39-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9c0aa9fea12009a5ee65284bf1273092037025ea93ceb5a28fcd715490eae37f
MD5 5a23933884ec7ac142e35027821823fd
BLAKE2b-256 c3fdc33bedf048a2452d1797c2ba8594c3995cedf1605fec9d30cdb1abd2bbd7

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp39-cp39-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 419c5f5d2aac01a35342168561c63b5597cd963b6770883269cc0460bb324ae4
MD5 e7399eb3c16d54aad64368093e385a56
BLAKE2b-256 3d2e3409d2e5a280a371acdef1358bf949720b6822a87b17d6e710089e15e3d3

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3ed082c2424ed2a8c76d9b4938e345fc6044b7c647a60b3ba9b10d9544b8594b
MD5 503fc05068ed9ebcaa9c2a14a4cdb94e
BLAKE2b-256 32a36de87361056b3163fbedae48486f882cd1bc1a01443a6c5a7821ac0e8b4c

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 41b3f6dcc245d1205ec33ef969fa4374b45d4d90cc481ff3482c6887051e8683
MD5 bcfa9fab088548c00e0ccc94858944ae
BLAKE2b-256 eb07fd6b79ffb57a7d558ef163046f6cc1b2c8418de230950d55924ac2a65120

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp38-cp38-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 378c5bbbbebc823c49bd14ac7df0eda3031914f7bd932eb3739531fe903f0d04
MD5 07555967cc75935902adf69b32e285ca
BLAKE2b-256 cae9abf9e888e4a6e8a8e9fc11a95b4db361bb48d85be1b5671d65ad1c6faaf6

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp38-cp38-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 3b7e0f95987a608f7a6b7fc15f202e5f735a92de83ddec7b8963e73a6f4d0e39
MD5 e2b0ebf425e116f90a8618a3cebb8e19
BLAKE2b-256 4091aff46fa2d58fc49ee0ec7c41642abc83fdf1ecccdb0e1ff1e2e1816f6299

See more details on using hashes here.

File details

Details for the file chrontext-0.9.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chrontext-0.9.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b82c3a37728a2cc43dd42181bfbd1077defa0f5b66396de295eb956eda8ec7aa
MD5 9458bc2cafb508e68838ada72a33a716
BLAKE2b-256 b1d547c543575d509809a15eb7147c80ea74c60feeb14eaea33fff3bbadf9620

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page