Skip to main content

Toolkit for flexible processing & feature extraction on time-series data

Project description

tsflex

PyPI Latest Release Conda Latest Release support-version codecov CodeQL Downloads PRs Welcome Documentation Testing

tsflex is a toolkit for flexible time series processing & feature extraction, that is efficient and makes few assumptions about sequence data.

Useful links

Installation

command
pip pip install tsflex
conda conda install -c conda-forge tsflex

Usage

tsflex is built to be intuitive, so we encourage you to copy-paste this code and toy with some parameters!

Feature extraction

import pandas as pd; import numpy as np; import scipy.stats as ss
from tsflex.features import MultipleFeatureDescriptors, FeatureCollection
from tsflex.utils.data import load_empatica_data

# 1. Load sequence-indexed data (in this case a time-index)
df_tmp, df_acc, df_ibi = load_empatica_data(['tmp', 'acc', 'ibi'])

# 2. Construct your feature extraction configuration
fc = FeatureCollection(
    MultipleFeatureDescriptors(
          functions=[np.min, np.mean, np.std, ss.skew, ss.kurtosis],
          series_names=["TMP", "ACC_x", "ACC_y", "IBI"],
          windows=["15min", "30min"],
          strides="15min",
    )
)

# 3. Extract features
fc.calculate(data=[df_tmp, df_acc, df_ibi], approve_sparsity=True)

Note that the feature extraction is performed on multivariate data with varying sample rates.

signal columns sample rate
df_tmp ["TMP"] 4Hz
df_acc ["ACC_x", "ACC_y", "ACC_z" ] 32Hz
df_ibi ["IBI"] irregularly sampled

Processing

Working example in our docs

Why tsflex? ✨

  • Flexible:
  • Efficient:
  • Intuitive:
    • maintains the sequence-index of the data
    • feature extraction constructs interpretable output column names
    • intuitive API
  • Few assumptions about the sequence data:
    • no assumptions about sampling rate
    • able to deal with multivariate asynchronous data
      i.e. data with small time-offsets between the modalities
  • Advanced functionalities:

¹ These integrations are shown in integration-example notebooks.

Future work 🔨

  • scikit-learn integration for both processing and feature extraction
    note: is actively developed upon sklearn integration branch.
  • Support time series segmentation (exposing under the hood strided-rolling functionality) - see this issue
  • Support for multi-indexed dataframes

=> Also see the enhancement issues

Contributing 👪

We are thrilled to see your contributions to further enhance tsflex.
See this guide for more instructions on how to contribute.

Referencing our package

If you use tsflex in a scientific publication, we would highly appreciate citing us as:

@article{vanderdonckt2021tsflex,
    author = {Van Der Donckt, Jonas and Van Der Donckt, Jeroen and Deprost, Emiel and Van Hoecke, Sofie},
    title = {tsflex: flexible time series processing \& feature extraction},
    journal = {SoftwareX},
    year = {2021},
    url = {https://github.com/predict-idlab/tsflex},
    publisher={Elsevier}
}

Link to the paper: https://www.sciencedirect.com/science/article/pii/S2352711021001904


👤 Jonas Van Der Donckt, Jeroen Van Der Donckt, Emiel Deprost

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsflex-0.4.0.tar.gz (58.6 kB view hashes)

Uploaded Source

Built Distribution

tsflex-0.4.0-py3-none-any.whl (65.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page