Skip to main content

Python SDK for Timeseer.AI

Project description

Timeseer.AI Client

The Timeseer.AI Client is a Python SDK to access the functionality of Timeseer.AI. Built on Apache Arrow, the SDK integrates natively with the Pandas or Polars ecosystems.

Installing

The Timeseer.AI Client is available on PyPI.

(venv) $ pip install timeseer

Connecting

The Timeseer.AI Client relies on Apache Arrow Flight for highly efficient data transfers.

Communications are protected by an API key. An API key can be generated within Timeseer under Configure > API keys. Each API key has a name and a secret value that is shown only once.

The API key is used to create a connection to a Timeseer instance running at a specific host and port:

>>> from timeseer_client import *
>>> api_key=('<api-key-name>', '<api-key>')
>>> client = Client(api_key, host='localhost', port=8081)

Functionality Overview

In Timeseer, time series data is available through two concepts:

  • Sources contain a varying number of time series that are constantly updated with new data.
  • Data Sets contain a fixed number of time series in a specific time range.

Sources are typically used for continuous monitoring of data, while Data Sets are the starting point for a data science project.

Time series data from Sources and Data Sets is processed by Flows. Flows analyze data or create derived Data Sets.

Insights and data that is generated by Flows are made available through Data Services.

The Timeseer.AI Client represents each of these concepts as a separate class that exposes the functionality that is specific to that concept. Each concept class is created by passing the Client to the constructor.

Full documentation is available in the code by running:

>>> import timeseer_client
>>> help(timeseer_client)

Usage

This usage sample generates a sine wave in using Pandas and numpy. Values below 0 of the sine wave are assumed to be the result of a faulty sensor reading. It shows how Timeseer can be used to analyze this and how it automatically creates a derived data set.

First install Pandas:

(venv) $ pip install pandas

Generate the sine wave data:

>>> import numpy as np
>>> import pandas as pd
>>> ts = pd.date_range("2022-01-01T00:00:00Z", "2022-02-01T00:00:00Z", freq="H")
>>> values = np.round(10 * np.sin(2 * np.pi * ((ts.astype(np.int64) // 10**9) - ts[0].timestamp()) / (24*60*60)), decimals=2)
>>> df = pd.DataFrame(dict(ts=ts, value=values))
>>> df.head(20)
                          ts  value
0  2022-01-01 00:00:00+00:00   0.00
1  2022-01-01 01:00:00+00:00   2.59
2  2022-01-01 02:00:00+00:00   5.00
3  2022-01-01 03:00:00+00:00   7.07
4  2022-01-01 04:00:00+00:00   8.66
5  2022-01-01 05:00:00+00:00   9.66
6  2022-01-01 06:00:00+00:00  10.00
7  2022-01-01 07:00:00+00:00   9.66
8  2022-01-01 08:00:00+00:00   8.66
9  2022-01-01 09:00:00+00:00   7.07
10 2022-01-01 10:00:00+00:00   5.00
11 2022-01-01 11:00:00+00:00   2.59
12 2022-01-01 12:00:00+00:00  -0.00
13 2022-01-01 13:00:00+00:00  -2.59
14 2022-01-01 14:00:00+00:00  -5.00
15 2022-01-01 15:00:00+00:00  -7.07
16 2022-01-01 16:00:00+00:00  -8.66
17 2022-01-01 17:00:00+00:00  -9.66
18 2022-01-01 18:00:00+00:00 -10.00
19 2022-01-01 19:00:00+00:00  -9.66

Define a Timeseer API key in Configure > API keys and use it to create a Client:

>>> from timeseer_client import *
>>> client = Client(("<api key name>", "<api key>"), host='timeseer.example.org', port=8081)

Timeseer uses metadata to automatically profile a time series. In this case, only the physical lower limit of the sensor that measured the time series is known, which is 0.

>>> from timeseer_client.metadata import fields
>>> series = SeriesSelector("Sines", {"function": "sine", "amplitude": "10"})
>>> metadata = Metadata(series, {fields.LimitLowPhysical: 0})

Each time series in Timeseer is identified by a SeriesSelector. Each SeriesSelector has a source ("Sines"), which will become the data set name, and tags and a field. This time series has the "function" and "amplitude" tags and the (default) "value" field.

For time series where additional structure is not available, a SeriesSelector can also be created using a single "series name" tag:

>>> SeriesSelector("Sines", "sine-10") == SeriesSelector("Sines", {"series name": "sine-10"})

Profiling this time series can be done using the profile convenience function:

>>> profile(client, "Sines", [(metadata, df)])
[{'type': 'flow', 'name': 'Sines'}, {'type': 'data service', 'name': 'Sines'}, {'type': 'data set', 'name': 'Sines'}]

The profile function creates a Data Set, a Data Service and a Flow with the given name, in this case "Sines". It also evaluates the flow.

Data should be provided as a pyarrow.Table or a Pandas DataFrame.

A Data Service summarizes the profiling results as Statistics and Event Frames.

Event Frames define a time range where something interesting has been detected.

>>> data_services = DataServices(client)
>>> data_service = DataServiceSelector('Sines', 'Sines')
>>> event_frames = data_services.get_event_frames(data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    61
Out of bounds (lower, physical)          31
Values below zero                        31
Upper limit is present                    1
Interpolation type is present             1
Compression - flat archival rate          1
Description is present                    1
Unit is present                           1
Name: type, dtype: int64

Not all profiling results are issues. In this case we can safely ignore the 'linear undercompression' events. The 'Out of bounds (lower, physical)' event frames cannot be ignored though, as was mentioned earlier.

Statistics can be used to gain high-level insight into the data and explain the Event Frames:

>>> data_services.get_statistics(data_service, series)
[... Statistic(name='Value statistics', data_type='table', result=[['Min', -10.0], ['Max', 10.0], ['Mean', 4.775152794086695e-18], ['Median', 0], ['Std', 7.073308943835715]]) ...]

It is clear (and expected based on the data generation) that the Out of bounds (lower, physical) Event Frames occur because the minimum value is -10.0.

Timeseer can automatically correct the data to be within bounds using various strategies. To create derived data in periods where an Event Frame is detected, a "filter" Block in a Flow on that event frame type needs to be inserted.

The derived data can be stored in a few ways. It is possible to create another Data Set, for example. Storing them in a Data Service instead will allow verification that the problem has been resolved, as data is stored there alongside quality indicators.

There is no shorthand for data cleaning, as each case will require different action. The most readable way to define the Flow that will create the derived data is in YAML.

Create sine-derive.yml:

---

- type: data service
  name: Derived sine results
  kpiSet: Data quality fundamentals
  range:
    start: "2022-01-01T00:00:00Z"
    end: "2022-02-01T00:00:00Z"

- type: flow
  name: Create derived sine
  dataSet: Sines
  blocks:

  - name: Analyze time series
    type: analysis

  - name: Hold last value when out of bounds
    type: filter
    augmentationStrategy: hold last value
    filters:
    - type: univariate
      filter: "Out of bounds (lower, physical)"
      series: ALL

  - name: Analyze derived time series
    type: analysis

  - name: Keep results for derived series in Derived sine results data service
    type: data_service_contribute
    dataServiceName: Derived sine results
    contributionBlockNames: [Analyze derived time series]

The Resources and Flows classes allow creating resources and evaluating flows respectively.

>>> resources = Resources(client)
>>> resources.create(path="sine-derive.yml")
>>> flows = Flows(client)
>>> flows.evaluate("Create derived sine")

The derived data has been profiled by the Flow. Profiling results are available in the "Derived sine results" Data Service:

>>> derived_data_service = DataServiceSelector('Derived sine results', 'Sines')
>>> event_frames = data_services.get_event_frames(derived_data_service)
>>> event_frames.to_pandas()['type'].value_counts()
compression - linear undercompression    31
Compression - flat archival rate          1
Interpolation type is present             1
Unit is present                           1
Description is present                    1
Upper limit is present                    1
Name: type, dtype: int64

The derived data does no longer contain values below 0:

>>> derived_data = data_services.get_data(derived_data_service, series)
>>> derived_data.to_pandas().head(26)
                           value
ts
2022-01-01 00:00:00+00:00   0.00
2022-01-01 01:00:00+00:00   2.59
2022-01-01 02:00:00+00:00   5.00
2022-01-01 03:00:00+00:00   7.07
2022-01-01 04:00:00+00:00   8.66
2022-01-01 05:00:00+00:00   9.66
2022-01-01 06:00:00+00:00  10.00
2022-01-01 07:00:00+00:00   9.66
2022-01-01 08:00:00+00:00   8.66
2022-01-01 09:00:00+00:00   7.07
2022-01-01 10:00:00+00:00   5.00
2022-01-01 11:00:00+00:00   2.59
2022-01-01 12:00:00+00:00  -0.00
2022-01-01 13:00:00+00:00  -0.00
2022-01-01 14:00:00+00:00  -0.00
2022-01-01 15:00:00+00:00  -0.00
2022-01-01 16:00:00+00:00  -0.00
2022-01-01 17:00:00+00:00  -0.00
2022-01-01 18:00:00+00:00  -0.00
2022-01-01 19:00:00+00:00  -0.00
2022-01-01 20:00:00+00:00  -0.00
2022-01-01 21:00:00+00:00  -0.00
2022-01-01 22:00:00+00:00  -0.00
2022-01-01 23:00:00+00:00  -0.00
2022-01-02 00:00:00+00:00   0.00
2022-01-02 01:00:00+00:00   2.59

This only scratches the surface of the functionality in Timeseer. Learn more in the Help menu in the user interface. All resources, blocks in Flows and event frame types are thoroughly documented.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseer-0.4.10.tar.gz (25.6 kB view hashes)

Uploaded Source

Built Distribution

timeseer-0.4.10-py3-none-any.whl (26.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page