Skip to main content

Time series data sets for PyTorch

Project description

Time series data sets for PyTorch

PyPi Build status Coverage License DOI

Ready-to-go PyTorch data sets for supervised time series prediction problems. torchtime currently supports:

  • All data sets in the UEA/UCR classification repository [link]

  • PhysioNet Challenge 2019 (sepsis prediction) [link]

Installation

$ pip install torchtime

Example usage

torchtime.data contains a class for each data set above. Each class has a consistent API.

The torchtime.data.UEA class returns the UEA/UCR data set specified by the dataset argument (see list of data sets here). For example, to load training data for the ArrowHead data set with a 70/30% training/validation split and create a DataLoader:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_prop=0.7,
    seed=123
)
dataloader = DataLoader(arrowhead, batch_size=32)

Batches are dictionaries of tensors X, y and length.

X are the time series data. The package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is (maximum) trajectory length and c is the number of channels. By default, a time stamp is appended to the time series data as the first channel.

y are one-hot encoded labels of shape (n, l) where l is the number of classes and length are the length of each trajectory (before padding if series are of irregular length) i.e. a tensor of shape (n).

ArrowHead is a univariate time series therefore X has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3).

next_batch = next(iter(dataloader))

next_batch["X"].shape  # (32, 251, 2)
next_batch["y"].shape  # (32, 3)
next_batch["length"].shape  # (32)

Additional options

  • The split argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_prop and val_prop arguments.

  • Missing data can be imputed by setting impute to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be used.

  • A time stamp, missing data mask and the time since previous observation can be appended to the time series data with the boolean arguments time, mask and delta respectively.

  • For reproducibility, an optional random seed can be specified.

Most UEA/UCR data sets are regularly sampled and fully observed. Missing data can be simulated using the missing argument to drop data at random from UEA/UCR data sets. See the tutorials and API for more information.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

References

  1. Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]

  2. Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]

  3. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]

  4. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]

  5. Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]

  6. Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]

  7. Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.2.0.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

torchtime-0.2.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file torchtime-0.2.0.tar.gz.

File metadata

  • Download URL: torchtime-0.2.0.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic

File hashes

Hashes for torchtime-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1aaddc3f2ac5487af981f337304f350885a02c1df9da45b7e432d23f6d0087e1
MD5 0a747009665eec5c6a170830e4373cba
BLAKE2b-256 28920e535856d17d866fd839897bce90327092e7a80620c72de6457217faba5d

See more details on using hashes here.

File details

Details for the file torchtime-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: torchtime-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic

File hashes

Hashes for torchtime-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1de6d12c0ce5b2729d7c499fa9057f83476414426d6a027eda25e87c4f1afa70
MD5 a3265c9c8988b53a79503159656c2e02
BLAKE2b-256 2290c8dc8485719ce0ae42b70bd8af13dde4c6082a615a60abd19380b28cf899

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page