Skip to main content

Benchmark time series data sets for PyTorch

Project description

Benchmark time series data sets for PyTorch

PyPi Build status Coverage License DOI

PyTorch data sets for supervised time series classification and prediction problems, including:

  • All UEA/UCR classification repository data sets
  • PhysioNet Challenge 2012 (in-hospital mortality)
  • PhysioNet Challenge 2019 (sepsis prediction)
  • A binary prediction variant of the 2019 PhysioNet Challenge

Why use torchtime?

  1. Saves time. You don't have to write your own PyTorch data classes.
  2. Better research. Use common, reproducible implementations of data sets for a level playing field when evaluating models.

Installation

$ pip install torchtime

Getting started

Data classes have a common API. The split argument determines whether training ("train"), validation ("val") or test ("test") data are returned. The size of the splits are controlled with the train_prop and (optional) val_prop arguments.

PhysioNet data sets

Three PhysioNet data sets are currently supported:

For example, to load training data for the 2012 challenge with a 70/30% training/validation split and create a DataLoader for model training:

from torch.utils.data import DataLoader
from torchtime.data import PhysioNet2012

physionet2012 = PhysioNet2012(
    split="train",
    train_prop=0.7,
)
dataloader = DataLoader(physionet2012, batch_size=32)

UEA/UCR repository data sets

The torchtime.data.UEA class returns the UEA/UCR repository data set specified by the dataset argument, for example:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_prop=0.7,
)
dataloader = DataLoader(arrowhead, batch_size=32)

Using the DataLoader

Batches are dictionaries of tensors X, y and length:

  • X are the time series data. The package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is (longest) trajectory length and c is the number of channels. By default, the first channel is a time stamp.
  • y are one-hot encoded labels of shape (n, l) where l is the number of classes.
  • length are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).

For example, ArrowHead is a univariate time series therefore X has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3). For a batch size of 32:

next_batch = next(iter(dataloader))
next_batch["X"].shape       # torch.Size([32, 251, 2])
next_batch["y"].shape       # torch.Size([32, 3])
next_batch["length"].shape  # torch.Size([32])

See Using DataLoaders for more information.

Advanced options

  • Missing data can be imputed by setting impute to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be passed to the impute argument.
  • A time stamp (added by default), missing data mask and the time since previous observation can be appended with the boolean arguments time, mask and delta respectively.
  • Time series data are standardised using the standardise boolean argument.
  • The location of cached data can be changed with the path argument, for example to share a single cache location across projects.
  • For reproducibility, an optional random seed can be specified.
  • Missing data can be simulated using the missing argument to drop data at random from UEA/UCR data sets.

See the tutorials and API for more information.

Other resources

If you're looking for the TensorFlow equivalent for PhysioNet data sets try medical_ts_datasets.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

Citing torchtime

If you use this software, please cite the paper:

@software{darke_torchtime_2022,
    author = Darke, Philip and Missier, Paolo and Bacardit, Jaume,
    title = "Benchmark time series data sets for {PyTorch} - the torchtime package",
    month = July,
    year = 2022,
    publisher={arXiv},
    doi = 10.48550/arXiv.2207.12503,
    url = https://doi.org/10.48550/arXiv.2207.12503,
}

DOIs are also available for each version of the package here.

References

  1. Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]

  2. Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]

  3. Silva, I, Moody, G, Scott, DJ, et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. Comput Cardiol 2012;39:245-248 (2010). [hdl]

  4. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]

  5. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]

  6. Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]

  7. Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]

  8. Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.5.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

torchtime-0.5.1-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file torchtime-0.5.1.tar.gz.

File metadata

  • Download URL: torchtime-0.5.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for torchtime-0.5.1.tar.gz
Algorithm Hash digest
SHA256 45ce7d95fa9974f0c6edb967cf08546f68b00e49d0f997a6a69281900f3de886
MD5 a4a2a2cd06fa8935bf0130f6036dda93
BLAKE2b-256 2529c145665f2628d9294318e46d440a8926b552a2814082760a11ee771668f6

See more details on using hashes here.

File details

Details for the file torchtime-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: torchtime-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for torchtime-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb3e6f2d8c6cfcc2350db6baa19ff698d80d76ed51b77b64b443bb59afeef4de
MD5 dcacf15d41ed3f18dac55e0ba87daf3a
BLAKE2b-256 946d1de4d6e9be597dae7ff43489350ea7d014a14747350a5102204b749eb6df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page