Skip to main content

Time series data sets for PyTorch

Project description

Time series data sets for PyTorch

PyPi Build status Coverage License

torchtime provides ready-to-go time series data sets for use in PyTorch. The current list of supported data sets is:

  • All data sets in the UEA/UCR classification repository [link]
  • PhysioNet Challenge 2019 [link]

The package follows the batch first convention. Data tensors are therefore of shape (n, s, c) where n is batch size, s is trajectory length and c are the number of channels.

Installation

$ pip install torchtime

Using torchtime

The example below uses the torchtime.data.UEA class. The data set is specified using the dataset argument (see list here). The split argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_split and val_split arguments.

For example, to load training data for the ArrowHead data set with a 70% training, 20% validation and 10% testing split:

from torch.utils.data import DataLoader
from torchtime.data import UEA

arrowhead = UEA(
    dataset="ArrowHead",
    split="train",
    train_split=0.7,
    val_split=0.2,
)
dataloader = DataLoader(arrowhead, batch_size=32)

Batches are dictionaries of tensors X, y and length. X are the time series data with an additional time stamp in the first channel, y are one-hot encoded labels and length are the length of each trajectory.

ArrowHead is a univariate time series with 251 observations in each trajectory. X therefore has two channels, the time stamp followed by the time series. A batch size of 32 was specified above therefore X has shape (32, 251, 2).

>> next(iter(dataloader))["X"].shape

torch.Size([32, 251, 2])

>> next(iter(dataloader))["X"]

tensor([[[  0.0000,  -1.8295],
         [  1.0000,  -1.8238],
         [  2.0000,  -1.8101],
         ...,
         [248.0000,  -1.7759],
         [249.0000,  -1.8088],
         [250.0000,  -1.8110]],

        ...,

        [[  0.0000,  -2.0147],
         [  1.0000,  -2.0311],
         [  2.0000,  -1.9471],
         ...,
         [248.0000,  -1.9901],
         [249.0000,  -1.9913],
         [250.0000,  -2.0109]]])

There are three classes therefore y has shape (32, 3).

>> next(iter(dataloader))["y"].shape

torch.Size([32, 3])

>> next(iter(dataloader))["y"]

tensor([[0, 0, 1],
        ...,
        [1, 0, 0]])

Finally, length is the length of each trajectory (before any padding for data sets of irregular length) and therefore has shape (32).

>> next(iter(dataloader))["length"].shape

torch.Size([32])

>> next(iter(dataloader))["length"]

tensor([251, ..., 251])

Learn more

Other features include missing data simulation for UEA data sets. See the API for more information.

This work is based on some of the data processing ideas in Kidger et al, 2020 [link] and Che et al, 2018 [link].

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtime-0.1.0.tar.gz (9.2 kB view hashes)

Uploaded Source

Built Distribution

torchtime-0.1.0-py3-none-any.whl (8.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page