Skip to main content

A Pythonic query language for time series data

Project description

Contributors Forks Stargazers Issues LinkedIn


TimeSeriesQL

A Pythonic query language for time series data

Table of Contents

About The Project

There are many time series databases and each have their own query language. Each platform takes time to invest in learning the structure and keywords of that language and often the skills learned don't translate to other platforms. The goal of this project is to create a time series specific library that can be used across many different time series databases as well as easy to learn because it uses Python syntax.

Built With

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

The requirements are in the requirements.txt file.

Installation

pip

pip install timeseriesql

manual

  1. Clone the timeseriesql
git clone https:://github.com/mbeale/timeseriesql.git
  1. Install library
cd timeseriesql
python setup.py install 

Usage

The way this project works is to provide a general framework for querying a time series with pluggable backends that communicate with specific time series databases. The queries are created using Python generators, a formatt familiar to Pythonistas.

data = Query(x for x in "metric.name" if x.some_label = "some_value").by("a_label")[start:end:resolution]

The return value is a TimeSeries object that uses a Numpy array as backend. That object can have ufuncs and other numpy functions applied against it. More examples to come.

There are defaults for start and resolution that are controlled by environment variables. That helps avoid fetching all measurements from the beginning of time by accident.

DEFAULT_START_OFFSET #defaults to 3600 seconds
DEFAULT_RESOLUTION   #defaults to 60 seconds

CSV Backend Usage

Often time series data is loaded from a CSV file. The backend expects the first column to be the time index in either a numerical timestamp or strings in ISO 8601 date or datetime format. The filters are applied to the headers of the CSV. If labels are not in the CSV and are supplied as part of the query, then filters will not be applied.

If any columns are empty or don't contain a numeric value, the value becomes a np.nan.

Basic CSV Usage

from timeseriesql.backends.csv_backend import CSVBackend

data = CSVBackend(x for x in "path/to.csv")[:]

Basic CSV Filtering

For CSV files the labels are the column headers. If there are columns that are not needed, they can be filtered out.

from timeseriesql.backends.csv_backend import CSVBackend

data = CSVBackend(x for x in "path/to.csv" if x.label == "A")[:]
data = CSVBackend(x for x in "path/to.csv" if x.label != "B")[:]
data = CSVBackend(x for x in "path/to.csv" if x.label in ["B", "C", "G"])[:]
data = CSVBackend(x for x in "path/to.csv" if x.label not in ["B", "C", "G"])[:]

Set the Labels

from timeseriesql.backends.csv_backend import CSVBackend

data = CSVBackend(x for x in "path/to.csv").labels(
    [
        {"label": "one"},
        {"label": "two"},
        {"label": "three"},
        {"label": "four"},
        {"label": "five"},
        {"label": "six"},
        {"label": "seven"},
    ]
)[:]

TimeSeries Usage

The TimeSeries object is allows for manipulation of the time series data after the it's been queried from the backend.

In the following examples, the variables starting with ts_ are assumed to be queried data from a backend.

TimeSeries Operations

# Basic mathematical operations (+, -, /, *)
ts_1 + 5 # will return a new series
ts_1 += 5 #will perform operation in place
ts_1 += ts_2 #add together two TimeSeries

TimeSeries Time Index

The time index is a array of floats but there is a built in method to convert the floats into np.datetime64.

ts_1.time # array of floats
ts_1.time.dt #array of np.datetime64

TimeSeries Merging

TimeSeries objects can be combined but the ending time indexes must be the same. This may require empty values to be created where the indexes don't align.

new_t = ts_1.merge([ts_2, ts_3])

TimeSeries Grouping/Reducing

If there are multiple streams, they can be grouped and merged by the labels.

reduced = ts_1.group(["hostname", "name"]).add() 
reduced = ts_1.group("env").mean()
reduced = ts_1.group("env").mean(axis=None) #setting the access to None will get the mean of the entire object

TimeSeries Special Indexing

import numpy as np

beg = np.datetime64('2019-02-25T03:00')
end = np.datetime64('2019-02-25T04:00')

ts_1[beg:end] # set a time range
ts_1[beg : np.timedelta64(3, "m")] # fetch from beginning + 3 minutes
ts_1[np.timedelta64(3, "m") :] #start from beginning + 3 minutes
ts_1[: np.timedelta64(3, "m")] #end at the end - 3 minutes


ts_1[{"hostname": "host2"}] # by labels

TimeSeries Rolling Windows

The rolling_window method assumes that the data is filled and at a fixed resolution. Number of periods is an integer and not a time range.

rolling_cum_sum = ts_1.rolling_window(12).add() #rolling cumsum
rolling_mean = ts_1.rolling_window(12).mean() # rolling mean
rolling = ts_1.rolling_window(12).median() #rolling median

TimeSeries Resample

The resample method allows a smaller period to be aggregated into a larger period.

resampled = ts_1.resample(300).mean() #resamples to 5 minutes and takes the mean

TimeSeries to Pandas

The conversion returns 2 pandas DataFrames, one for the labels and the other for the data.

data, labels = ts_1.to_pandas()

Plotting Libs

Available

Creating a custom backend

Start by extending the Plot class.

from timeseries.plot import Plot
class NewPlottingLib(Plot):
  pass

There is a list of functions that can be extended for as different plots. Also there are functions that generate titles, xlabel, ylabel, and legend labels. Use those to grab default information. They can be overridden to provide more custom logic around the those fields.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Michael Beale - michael.beale@gmail.com

Project Link: https://github.com/mbeale/timeseriesql

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timeseriesql-0.1.6.tar.gz (30.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

timeseriesql-0.1.6-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file timeseriesql-0.1.6.tar.gz.

File metadata

  • Download URL: timeseriesql-0.1.6.tar.gz
  • Upload date:
  • Size: 30.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.1

File hashes

Hashes for timeseriesql-0.1.6.tar.gz
Algorithm Hash digest
SHA256 2a7ee1679833e8a8462620b7128d69bb6cf71356f9dc9bf89fe8a621f1809d83
MD5 fb1503bd54e7ea9ab24788ab35f65b81
BLAKE2b-256 b9ad7c98d76bb4c7ed3bfe3640721636ed5d3b48be1bce0799e828393edb3ffe

See more details on using hashes here.

File details

Details for the file timeseriesql-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: timeseriesql-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.1

File hashes

Hashes for timeseriesql-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9e2088d2bc072a7f2cc3a951ae4628c7e5817d043bdcea92d21fc42db6e09fb6
MD5 9d657145af9a82b3bda930402905d327
BLAKE2b-256 4c5349fcf4f174e82499bdc6b3316df481c8a76fa00a05fbdc5d34a05ae57917

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page