Skip to main content

Time series data sets for PyTorch

Project description

Time series data sets for PyTorch

PyPi Build status Coverage License DOI

Ready-to-go PyTorch data sets for supervised time series prediction problems. torchtime currently supports:

  • All data sets in the UEA/UCR classification repository [link]

  • PhysioNet Challenge 2012 (in-hospital mortality) [link]

  • PhysioNet Challenge 2019 (sepsis prediction) [link]

Installation

$ pip install torchtime

Example usage

torchtime.data contains a class for each data set above. Each class has a consistent API.

The torchtime.data.UEA class returns the UEA/UCR data set specified by the dataset argument (see list of data sets here). For example, to load training data for the ArrowHead data set with a 70/30% training/validation split and create a DataLoader:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_prop=0.7,
)
dataloader = DataLoader(arrowhead, batch_size=32)

Batches are dictionaries of tensors X, y and length.

X are the time series data. The package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is (maximum) trajectory length and c is the number of channels. By default, a time stamp is appended to the time series data as the first channel.

y are one-hot encoded labels of shape (n, l) where l is the number of classes and length are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).

ArrowHead is a univariate time series therefore X has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3).

next_batch = next(iter(dataloader))

next_batch["X"].shape       # torch.Size([32, 251, 2])
next_batch["y"].shape       # torch.Size([32, 3])
next_batch["length"].shape  # torch.Size([32])

Additional options

  • The split argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_prop and val_prop arguments.

  • Missing data can be imputed by setting impute to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be used.

  • A time stamp, missing data mask and the time since previous observation can be appended with the boolean arguments time, mask and delta respectively.

  • For reproducibility, an optional random seed can be specified.

Most UEA/UCR data sets are regularly sampled and fully observed. Missing data can be simulated using the missing argument to drop data at random from UEA/UCR data sets. See the tutorials and API for more information.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

References

  1. Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]

  2. Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]

  3. Silva, I, Moody, G, Scott, DJ, et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. Comput Cardiol 2012;39:245-248 (2010). [hdl]

  4. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]

  5. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]

  6. Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]

  7. Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]

  8. Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.3.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

torchtime-0.3.0-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file torchtime-0.3.0.tar.gz.

File metadata

  • Download URL: torchtime-0.3.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic

File hashes

Hashes for torchtime-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4c1e86317c42076aa5c2cf83f39a35811022da3dbb3190d3de8be7caa12232e6
MD5 d35ffb49d48f24477fd5d0cddc6d8cec
BLAKE2b-256 efa649473a17700ea25e0eefb1f01ce8340e2c11b499fc4817cf72f9c6e24b98

See more details on using hashes here.

File details

Details for the file torchtime-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: torchtime-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic

File hashes

Hashes for torchtime-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 234efcb3ca6f06285d81af0f240eac0c17acd8abd95801f3df289cc1aa4e332e
MD5 28a12ca38753071a59792223fa9c4a02
BLAKE2b-256 cc9a1e46bfbf746d1714987cea7feba05414a22d44fafbea07b9a3485ace180a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page