Skip to main content

Python SDK for Timeseer.AI

Project description

Timeseer.AI Client

The Timeseer.AI Client is a Python SDK to access the functionality of Timeseer.AI. Built on Apache Arrow, the SDK integrates natively with the Pandas or Polars ecosystems.

Installing

The Timeseer.AI Client is available on PyPI.

(venv) $ pip install timeseer

Connecting

The Timeseer.AI Client uses the Timeseer REST API and uses Apache Arrow where possible to make data transfers efficient.

Communications are protected by an API key. An API key can be generated within Timeseer under Configure > API keys. Each API key has a name and a secret value that is shown only once.

The API key is used to create a connection to a Timeseer instance running at a specific host and port:

>>> from timeseer_client import *
>>> api_key=('<api-key-name>', '<api-key>')
>>> client = Client(api_key, host='localhost', port=8081)

Functionality Overview

In Timeseer, time series data is available through two concepts:

  • Sources contain a varying number of time series that are constantly updated with new data.
  • Data Sets contain a fixed number of time series in a specific time range.

Sources are typically used for continuous monitoring of data, while Data Sets are the starting point for a data science project.

Time series data from Sources and Data Sets is processed by Flows. Flows analyze data or create derived Data Sets.

Insights and data that is generated by Flows are made available through Data Services.

The Timeseer.AI Client represents each of these concepts as a separate class that exposes the functionality that is specific to that concept. Each concept class is created by passing the Client to the constructor.

Full documentation is available in the code by running:

>>> import timeseer_client
>>> help(timeseer_client)

Usage

This usage sample generates a sine wave in using Pandas and numpy. Values below 0 of the sine wave are assumed to be the result of a faulty sensor reading. It shows how Timeseer can be used to analyze this and how it automatically creates a derived data set.

First install Pandas:

(venv) $ pip install pandas

Generate the sine wave data:

>>> import numpy as np
>>> import pandas as pd
>>> ts = pd.date_range("2022-01-01T00:00:00Z", "2022-02-01T00:00:00Z", freq="h")
>>> values = np.round(10 * np.sin(2 * np.pi * ((ts.astype(np.int64) // 10**9) - ts[0].timestamp()) / (24*60*60)), decimals=2)
>>> df = pd.DataFrame(dict(ts=ts, value=values))
>>> df.head(20)
                          ts  value
0  2022-01-01 00:00:00+00:00   0.00
1  2022-01-01 01:00:00+00:00   2.59
2  2022-01-01 02:00:00+00:00   5.00
3  2022-01-01 03:00:00+00:00   7.07
4  2022-01-01 04:00:00+00:00   8.66
5  2022-01-01 05:00:00+00:00   9.66
6  2022-01-01 06:00:00+00:00  10.00
7  2022-01-01 07:00:00+00:00   9.66
8  2022-01-01 08:00:00+00:00   8.66
9  2022-01-01 09:00:00+00:00   7.07
10 2022-01-01 10:00:00+00:00   5.00
11 2022-01-01 11:00:00+00:00   2.59
12 2022-01-01 12:00:00+00:00  -0.00
13 2022-01-01 13:00:00+00:00  -2.59
14 2022-01-01 14:00:00+00:00  -5.00
15 2022-01-01 15:00:00+00:00  -7.07
16 2022-01-01 16:00:00+00:00  -8.66
17 2022-01-01 17:00:00+00:00  -9.66
18 2022-01-01 18:00:00+00:00 -10.00
19 2022-01-01 19:00:00+00:00  -9.66

Define a Timeseer API key in Configure > API keys and use it to create a Client:

>>> from timeseer_client import *
>>> client = Client(("<api key name>", "<api key>"), host='timeseer.example.org', port=8081)

Timeseer uses metadata to automatically profile a time series. In this case, only the physical lower limit of the sensor that measured the time series is known, which is 0.

>>> from timeseer_client.metadata import fields
>>> series = SeriesSelector("Sines", {"function": "sine", "amplitude": "10"})
>>> metadata = Metadata(series, {fields.LimitLowPhysical: 0})

Each time series in Timeseer is identified by a SeriesSelector. Each SeriesSelector has a source ("Sines"), which will become the data set name, and tags and a field. This time series has the "function" and "amplitude" tags and the (default) "value" field.

For time series where additional structure is not available, a SeriesSelector can also be created using a single "series name" tag:

>>> SeriesSelector("Sines", "sine-10") == SeriesSelector("Sines", {"series name": "sine-10"})

Profiling this time series can be done using the profile convenience function:

>>> profile(client, "Sines", [(metadata, df)])
[{'type': 'flow', 'name': 'Sines'}, {'type': 'data service', 'name': 'Sines'}, {'type': 'data set', 'name': 'Sines'}]

The profile function creates a Data Set, a Data Service and a Flow with the given name, in this case "Sines". It also evaluates the flow.

Data should be provided as a pyarrow.Table or a Pandas DataFrame.

A Data Service summarizes the profiling results as Statistics and Event Frames.

Event Frames define a time range where something interesting has been detected.

>>> data_services = DataServices(client)
>>> data_service = DataServiceSelector('Sines', 'Sines')
>>> event_frames = data_services.get_event_frames(data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    61
Out of bounds (lower, physical)          31
Values below zero                        31
Upper limit is present                    1
Interpolation type is present             1
Compression - flat archival rate          1
Description is present                    1
Unit is present                           1
Name: type, dtype: int64

Not all profiling results are issues. In this case we can safely ignore the 'linear undercompression' events. The 'Out of bounds (lower, physical)' event frames cannot be ignored though, as was mentioned earlier.

Statistics can be used to gain high-level insight into the data and explain the Event Frames:

>>> data_services.get_statistics(data_service, series)
[... Statistic(name='Value statistics', data_type='table', result=[['Min', -10.0], ['Max', 10.0], ['Mean', 4.775152794086695e-18], ['Median', 0], ['Std', 7.073308943835715]]) ...]

It is clear (and expected based on the data generation) that the Out of bounds (lower, physical) Event Frames occur because the minimum value is -10.0.

Timeseer can automatically correct the data to be within bounds using various strategies. To create derived data in periods where an Event Frame is detected, a "filter" Block in a Flow on that event frame type needs to be inserted.

The derived data can be stored in a few ways. It is possible to create another Data Set, for example. Storing them in a Data Service instead will allow verification that the problem has been resolved, as data is stored there alongside quality indicators.

There is no shorthand for data cleaning, as each case will require different action. The most readable way to define the Flow that will create the derived data is in YAML.

Create sine-derive.yml:

---

- type: data service
  name: Derived sine results
  kpiSet: Data quality fundamentals
  range:
    start: "2022-01-01T00:00:00Z"
    end: "2022-02-01T00:00:00Z"

- type: flow
  name: Create derived sine
  dataSet: Sines
  blocks:

  - name: Analyze time series
    type: analysis

  - name: Hold last value when out of bounds
    type: filter
    augmentationStrategy: hold last value
    filters:
    - type: univariate
      filter: "Out of bounds (lower, physical)"
      series: ALL

  - name: Analyze derived time series
    type: analysis

  - name: Keep results for derived series in Derived sine results data service
    type: data_service_contribute
    dataServiceName: Derived sine results
    contributionBlockNames: [Analyze derived time series]

The Resources and Flows classes allow creating resources and evaluating flows respectively.

>>> resources = Resources(client)
>>> resources.create(path="sine-derive.yml")
>>> flows = Flows(client)
>>> flows.evaluate("Create derived sine")

The derived data has been profiled by the Flow. Profiling results are available in the "Derived sine results" Data Service:

>>> derived_data_service = DataServiceSelector('Derived sine results', 'Sines')
>>> event_frames = data_services.get_event_frames(derived_data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    31
Compression - flat archival rate          1
Interpolation type is present             1
Unit is present                           1
Description is present                    1
Upper limit is present                    1
Name: type, dtype: int64

The derived data does no longer contain values below 0:

>>> derived_data = data_services.get_data(derived_data_service, series)
>>> derived_data.to_pandas().head(26)
                           value
ts
2022-01-01 00:00:00+00:00   0.00
2022-01-01 01:00:00+00:00   2.59
2022-01-01 02:00:00+00:00   5.00
2022-01-01 03:00:00+00:00   7.07
2022-01-01 04:00:00+00:00   8.66
2022-01-01 05:00:00+00:00   9.66
2022-01-01 06:00:00+00:00  10.00
2022-01-01 07:00:00+00:00   9.66
2022-01-01 08:00:00+00:00   8.66
2022-01-01 09:00:00+00:00   7.07
2022-01-01 10:00:00+00:00   5.00
2022-01-01 11:00:00+00:00   2.59
2022-01-01 12:00:00+00:00  -0.00
2022-01-01 13:00:00+00:00  -0.00
2022-01-01 14:00:00+00:00  -0.00
2022-01-01 15:00:00+00:00  -0.00
2022-01-01 16:00:00+00:00  -0.00
2022-01-01 17:00:00+00:00  -0.00
2022-01-01 18:00:00+00:00  -0.00
2022-01-01 19:00:00+00:00  -0.00
2022-01-01 20:00:00+00:00  -0.00
2022-01-01 21:00:00+00:00  -0.00
2022-01-01 22:00:00+00:00  -0.00
2022-01-01 23:00:00+00:00  -0.00
2022-01-02 00:00:00+00:00   0.00
2022-01-02 01:00:00+00:00   2.59

This only scratches the surface of the functionality in Timeseer. Learn more in the Help menu in the user interface. All resources, blocks in Flows and event frame types are thoroughly documented.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseer-0.6.1.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

timeseer-0.6.1-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file timeseer-0.6.1.tar.gz.

File metadata

  • Download URL: timeseer-0.6.1.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for timeseer-0.6.1.tar.gz
Algorithm Hash digest
SHA256 14341d186663f12adb1cfab94a0267d9faed2ec48c83a54e6eb609e04f81951a
MD5 c8c8e661ace1b043a4f0cb4cd5d8b894
BLAKE2b-256 141e7c46479d56297d37f6728d8320585c05df1104b0bb4dc589e118ab4838e7

See more details on using hashes here.

File details

Details for the file timeseer-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: timeseer-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for timeseer-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2122c5b4f5604f4f0ef601a84c715d1a16f065e225df86511bd938caf9036046
MD5 ebdeee0784fa3ebc86c46798948c16fb
BLAKE2b-256 6c696baadaa243b5c8b62649e91b85c5ffade243f6e62379228033454c0c06ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page