Time series data sets for PyTorch
Project description
Time series data sets for PyTorch
Ready-to-go PyTorch data sets for supervised time series prediction problems. torchtime
currently supports:
-
All data sets in the UEA/UCR classification repository [link]
-
PhysioNet Challenge 2012 (in-hospital mortality) [link]
-
PhysioNet Challenge 2019 (sepsis prediction) [link]
Installation
$ pip install torchtime
Example usage
torchtime.data
contains a class for each data set above. Each class has a consistent API.
The torchtime.data.UEA
class returns the UEA/UCR data set specified by the dataset
argument (see list of data sets here). For example, to load training data for the ArrowHead data set with a 70/30% training/validation split and create a DataLoader:
from torch.utils.data import DataLoader
from torchtime.data import UEA
arrowhead = UEA(
dataset="ArrowHead",
split="train",
train_prop=0.7,
)
dataloader = DataLoader(arrowhead, batch_size=32)
Batches are dictionaries of tensors X
, y
and length
.
X
are the time series data. The package follows the batch first convention therefore X
has shape (n, s, c) where n is batch size, s is (maximum) trajectory length and c is the number of channels. By default, a time stamp is appended to the time series data as the first channel.
y
are one-hot encoded labels of shape (n, l) where l is the number of classes and length
are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).
ArrowHead is a univariate time series therefore X
has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3).
next_batch = next(iter(dataloader))
next_batch["X"].shape # torch.Size([32, 251, 2])
next_batch["y"].shape # torch.Size([32, 3])
next_batch["length"].shape # torch.Size([32])
Additional options
-
The
split
argument determines whether training, validation or test data are returned. The size of the splits are controlled with thetrain_prop
andval_prop
arguments. -
Missing data can be imputed by setting
impute
to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be used. -
A time stamp, missing data mask and the time since previous observation can be appended with the boolean arguments
time
,mask
anddelta
respectively. -
For reproducibility, an optional random
seed
can be specified.
Most UEA/UCR data sets are regularly sampled and fully observed. Missing data can be simulated using the missing
argument to drop data at random from UEA/UCR data sets. See the tutorials and API for more information.
Acknowledgements
torchtime
uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].
This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).
References
-
Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]
-
Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]
-
Silva, I, Moody, G, Scott, DJ, et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. Comput Cardiol 2012;39:245-248 (2010). [hdl]
-
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]
-
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]
-
Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]
-
Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]
-
Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]
License
Released under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file torchtime-0.3.0.tar.gz
.
File metadata
- Download URL: torchtime-0.3.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c1e86317c42076aa5c2cf83f39a35811022da3dbb3190d3de8be7caa12232e6 |
|
MD5 | d35ffb49d48f24477fd5d0cddc6d8cec |
|
BLAKE2b-256 | efa649473a17700ea25e0eefb1f01ce8340e2c11b499fc4817cf72f9c6e24b98 |
File details
Details for the file torchtime-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: torchtime-0.3.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.13.0-39-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 234efcb3ca6f06285d81af0f240eac0c17acd8abd95801f3df289cc1aa4e332e |
|
MD5 | 28a12ca38753071a59792223fa9c4a02 |
|
BLAKE2b-256 | cc9a1e46bfbf746d1714987cea7feba05414a22d44fafbea07b9a3485ace180a |