Skip to main content

Time series data sets for PyTorch

Project description

Time series data sets for PyTorch

PyPi Build status Coverage License DOI

Ready-to-go PyTorch data sets for supervised time series prediction problems. torchtime currently supports:

  • All data sets in the UEA/UCR classification repository [link]

  • PhysioNet Challenge 2019 (sepsis prediction) [link]

Installation

$ pip install torchtime

Example usage

torchtime.data contains a class for each data set above. Each class has a consistent API.

The torchtime.data.UEA class returns the UEA/UCR data set specified by the dataset argument (see list of data sets here). For example, to load training data for the ArrowHead data set with a 70/30% training/validation split and create a DataLoader:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_prop=0.7,
    seed=123
)
dataloader = DataLoader(arrowhead, batch_size=32)

Batches are dictionaries of tensors X, y and length.

X are the time series data. The package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is (maximum) trajectory length and c is the number of channels. By default, a time stamp is appended to the time series data as the first channel.

y are one-hot encoded labels of shape (n, l) where l is the number of classes and length are the length of each trajectory (before padding if series are of irregular length) i.e. a tensor of shape (n).

ArrowHead is a univariate time series therefore X has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3).

next_batch = next(iter(dataloader))

next_batch["X"].shape  # (32, 251, 2)
next_batch["y"].shape  # (32, 3)
next_batch["length"].shape  # (32)

Additional options

  • The split argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_prop and val_prop arguments.

  • Missing data can be imputed by setting impute to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be used.

  • A time stamp, missing data mask and the time since previous observation can be appended to the time series data with the boolean arguments time, mask and delta respectively.

  • For reproducibility, an optional random seed can be specified.

Most UEA/UCR data sets are regularly sampled and fully observed. Missing data can be simulated using the missing argument to drop data at random from UEA/UCR data sets. See the tutorials and API for more information.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

References

  1. Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]

  2. Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]

  3. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]

  4. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]

  5. Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]

  6. Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]

  7. Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.2.0.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

torchtime-0.2.0-py3-none-any.whl (14.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page