Skip to main content

Time series data sets for PyTorch

Project description

Time series data sets for PyTorch

PyPi Build status Coverage License

torchtime provides ready-to-go time series data sets for use in PyTorch. The current list of supported data sets is:

  • All data sets in the UEA/UCR classification repository [link]
  • PhysioNet Challenge 2019 (early prediction of sepsis) [link]

Installation

$ pip install torchtime

Using torchtime

The example below uses the torchtime.data.UEA class. The data set is specified using the dataset argument (see list of data sets here). The split argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_split and val_split arguments. Reproducibility is achieved using the seed argument.

For example, to load training data for the ArrowHead data set with a 70/30% training/validation split:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_split=0.7,
    seed=456789,
)
dataloader = DataLoader(arrowhead, batch_size=32)

The DataLoader returns batches as a dictionary of tensors X, y and length. X are the time series data. By default, a time stamp is appended to the data as the first channel. This package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is trajectory length and c is the number of channels.

ArrowHead is a univariate time series with 251 observations in each trajectory. X therefore has two channels, the time stamp followed by the time series.

>> next(iter(dataloader))["X"]

tensor([[[  0.0000,  -1.8302],
         [  1.0000,  -1.8123],
         [  2.0000,  -1.8122],
         ...,
         [248.0000,  -1.7821],
         [249.0000,  -1.7971],
         [250.0000,  -1.8280]],

        ...,

        [[  0.0000,  -1.8392],
         [  1.0000,  -1.8314],
         [  2.0000,  -1.8125],
         ...,
         [248.0000,  -1.8359],
         [249.0000,  -1.8202],
         [250.0000,  -1.8387]]])

Labels y are one-hot encoded and have shape (n, l) where l is the number of classes.

>> next(iter(dataloader))["y"]

tensor([[0, 0, 1],
        [1, 0, 0],
        [1, 0, 0],

        ...,

        [0, 0, 1],
        [0, 1, 0],
        [1, 0, 0]])

The length of each trajectory (before padding if the data set is of irregular length) is provided as a tensor of shape (n).

>> next(iter(dataloader))["length"]

tensor([251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251,
        251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251,
        251, 251, 251, 251])

Learn more

Missing data can be simulated using the missing argument. In addition, missing data/observational masks and time delta channels can be appended using the mask and delta arguments. See the tutorial and API for more information.

This work is based on some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

References

  1. Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]

  2. Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]

  3. Reyna M, Josef C, Jeter R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]

  4. Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]

  5. Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]

Funding

This work was supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.1.1.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtime-0.1.1-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file torchtime-0.1.1.tar.gz.

File metadata

  • Download URL: torchtime-0.1.1.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-37-generic

File hashes

Hashes for torchtime-0.1.1.tar.gz
Algorithm Hash digest
SHA256 60d661c4cebc09fb790dbd350f633e2f9e6b74e072aaa8a49029445fc76d57fd
MD5 c909df056d69f9a2341855859fa33639
BLAKE2b-256 89e6076e9d855cb30d30c1303ce7c451f9ceba18a44f48a6d83a75a8362e485a

See more details on using hashes here.

File details

Details for the file torchtime-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: torchtime-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-37-generic

File hashes

Hashes for torchtime-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 400d1c50881d83a7af9d5909f5d8bb06086bb2855d225f52d6105c4423c02f14
MD5 6c0763291aa7fedfcfbf2bc821e4cdad
BLAKE2b-256 df4fdc311199957bf3c3fdc819f8ae760c68f81bc4cf8315980ee1cb047ad0f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page