Time series data sets for PyTorch
Project description
Time series data sets for PyTorch
torchtime
provides ready-to-go time series data sets for use in PyTorch. The current list of supported data sets is:
The package follows the batch first convention. Data tensors are therefore of shape (n, s, c) where n is batch size, s is trajectory length and c are the number of channels.
Installation
$ pip install torchtime
Using torchtime
The example below uses the torchtime.data.UEA
class. The data set is specified using the dataset
argument (see list here). The split
argument determines whether training, validation or test data are returned. The size of the splits are controlled with the train_split
and val_split
arguments.
For example, to load training data for the ArrowHead data set with a 70% training, 20% validation and 10% testing split:
from torch.utils.data import DataLoader
from torchtime.data import UEA
arrowhead = UEA(
dataset="ArrowHead",
split="train",
train_split=0.7,
val_split=0.2,
)
dataloader = DataLoader(arrowhead, batch_size=32)
Batches are dictionaries of tensors X
, y
and length
. X
are the time series data with an additional time stamp in the first channel, y
are one-hot encoded labels and length
are the length of each trajectory.
ArrowHead is a univariate time series with 251 observations in each trajectory. X
therefore has two channels, the time stamp followed by the time series. A batch size of 32 was specified above therefore X
has shape (32, 251, 2).
>> next(iter(dataloader))["X"].shape
torch.Size([32, 251, 2])
>> next(iter(dataloader))["X"]
tensor([[[ 0.0000, -1.8295],
[ 1.0000, -1.8238],
[ 2.0000, -1.8101],
...,
[248.0000, -1.7759],
[249.0000, -1.8088],
[250.0000, -1.8110]],
...,
[[ 0.0000, -2.0147],
[ 1.0000, -2.0311],
[ 2.0000, -1.9471],
...,
[248.0000, -1.9901],
[249.0000, -1.9913],
[250.0000, -2.0109]]])
There are three classes therefore y
has shape (32, 3).
>> next(iter(dataloader))["y"].shape
torch.Size([32, 3])
>> next(iter(dataloader))["y"]
tensor([[0, 0, 1],
...,
[1, 0, 0]])
Finally, length
is the length of each trajectory (before any padding for data sets of irregular length) and therefore has shape (32).
>> next(iter(dataloader))["length"].shape
torch.Size([32])
>> next(iter(dataloader))["length"]
tensor([251, ..., 251])
Learn more
Other features include missing data simulation for UEA data sets. See the API for more information.
This work is based on some of the data processing ideas in Kidger et al, 2020 [link] and Che et al, 2018 [link].
License
Released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torchtime-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5124ee95551cd07d5ded162b49e0ba137cc3505404012cf8e6c30396d7bb41b8 |
|
MD5 | ca0785c341d8129f8eb6b1de36a8df32 |
|
BLAKE2b-256 | ab4d0f6699a08f013167e862e765a60fe7e1981922a7b809c9fb4c2801d6ef09 |