Skip to main content

Toolkit for flexible operations on time-series data

Project description

tsflex

tsflex stands for: flexible time-series operations

It is a time-series first toolkit for processing & feature extraction, making few assumptions about input data.

Table of contents

Installation

:WIP: - not yet published to pypi

pip install tsflex

Advantages of tsflex

tsflex has multiple selling points, for example

todo: create links to example benchmarking notebooks

  • it is efficient
    • execution time -> multiprocessing / vectorized
    • memory -> view based operations
  • it is flexible:
    feature extraction:
    • multiple series, signal & stride combinations are possible
    • no frequency requirements, just a datetime index
  • it has logging capabilities to improve feature extraction speed.
  • it is field & unit tested
  • it has a comprehensive documentation
  • it is compatible with sklearn (w.i.p. for gridsearch integration), pandas and numpy

Usage

Series processing

import pandas as pd
import scipy.stats
import numpy as np

from tsflex.processing import SeriesProcessor, SeriesPipeline

Feature extraction

The only data assumptions made by tsflex are:

  • the data has a pd.DatetimeIndex & this index is monotonically_increasing
  • the data's series names must be unique
import pandas as pd
import scipy.stats
import numpy as np

from tsflex.features import FeatureDescriptor, FeatureCollection

# 1. Construct the collection in which you add all your features
fc = FeatureCollection(
    feature_descriptors=[
        FeatureDescriptor(
            function=scipy.stats.skew,
            series_name="myseries",
            window="1day",
            stride="6hours"
        )
    ]
)
# -- 1.1 Add another feature to the feature collection
fc.add(FeatureDescriptor(np.min, 'myseries', '2days', '1day'))

# 2. Get your time-indexed data
data = pd.Series(
    data=np.random.random(10_000), 
    index=pd.date_range("2021-07-01", freq="1h", periods=10_000),
).rename('myseries')
# -- 2.1 drop some data, as we don't make frequency assumptions
data = data.drop(np.random.choice(data.index, 200, replace=False))

# 3. Calculate the feature on some data
fc.calculate(data=data, n_jobs=1, return_df=True)
# which outputs: a pd.DataFrame with content:
index myseries__skew__w=1D_s=12h myseries__amin__w=2D_s=1D
2021-07-02 00:00:00 -0.0607221 nan
2021-07-02 12:00:00 -0.142407 nan
2021-07-03 00:00:00 -0.283447 0.042413
2021-07-03 12:00:00 -0.353314 nan
2021-07-04 00:00:00 -0.188953 0.0011865
2021-07-04 12:00:00 0.259685 nan
2021-07-05 00:00:00 0.726858 0.0011865
... ... ...

Documentation

:WIP:

Too see the documentation locally, install pdoc and execute the succeeding command from this folder location.

pdoc3 --template-dir docs/pdoc_template/ --http :8181 tsflex


👤 Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsflex-0.1.0.tar.gz (31.1 kB view hashes)

Uploaded Source

Built Distribution

tsflex-0.1.0-py3-none-any.whl (40.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page