Time series data sets for PyTorch
Project description
Time series data sets for PyTorch
Ready-to-go PyTorch data sets for supervised time series prediction problems. torchtime
currently supports:
-
All data sets in the UEA/UCR classification repository [link]
-
PhysioNet Challenge 2012 (in-hospital mortality) [link]
-
PhysioNet Challenge 2019 (sepsis prediction) [link]
Installation
$ pip install torchtime
Example usage
torchtime.data
contains a class for each data set above. Each class has a consistent API.
The torchtime.data.UEA
class returns the UEA/UCR data set specified by the dataset
argument (see list of data sets here). For example, to load training data for the ArrowHead data set with a 70/30% training/validation split and create a DataLoader:
from torch.utils.data import DataLoader
from torchtime.data import UEA
arrowhead = UEA(
dataset="ArrowHead",
split="train",
train_prop=0.7,
)
dataloader = DataLoader(arrowhead, batch_size=32)
Batches are dictionaries of tensors X
, y
and length
.
X
are the time series data. The package follows the batch first convention therefore X
has shape (n, s, c) where n is batch size, s is (maximum) trajectory length and c is the number of channels. By default, a time stamp is appended to the time series data as the first channel.
y
are one-hot encoded labels of shape (n, l) where l is the number of classes and length
are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).
ArrowHead is a univariate time series therefore X
has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3).
next_batch = next(iter(dataloader))
next_batch["X"].shape # torch.Size([32, 251, 2])
next_batch["y"].shape # torch.Size([32, 3])
next_batch["length"].shape # torch.Size([32])
Additional options
-
The
split
argument determines whether training, validation or test data are returned. The size of the splits are controlled with thetrain_prop
andval_prop
arguments. -
Missing data can be imputed by setting
impute
to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be used. -
A time stamp, missing data mask and the time since previous observation can be appended with the boolean arguments
time
,mask
anddelta
respectively. -
For reproducibility, an optional random
seed
can be specified.
Most UEA/UCR data sets are regularly sampled and fully observed. Missing data can be simulated using the missing
argument to drop data at random from UEA/UCR data sets. See the tutorials and API for more information.
Acknowledgements
torchtime
uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].
This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).
References
-
Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]
-
Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]
-
Silva, I, Moody, G, Scott, DJ, et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. Comput Cardiol 2012;39:245-248 (2010). [hdl]
-
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]
-
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]
-
Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]
-
Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]
-
Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]
License
Released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torchtime-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 234efcb3ca6f06285d81af0f240eac0c17acd8abd95801f3df289cc1aa4e332e |
|
MD5 | 28a12ca38753071a59792223fa9c4a02 |
|
BLAKE2b-256 | cc9a1e46bfbf746d1714987cea7feba05414a22d44fafbea07b9a3485ace180a |