Skip to main content

Collections of utility functions to download open-source data sets.

Project description

loadmydata

Utility functions for loading time series data sets (Python 3.7++).

The list of available data sets currently includes:

  • the UEA/UCR repository.

Install

This package relies on requests, tqdm, yarl (for the download), and numpy.

Use pip to install.

pip install loadmydata

Alternatively, you can use conda:

conda config --add channels conda-forge
conda install loadmydata

Data format

Consider a data set of N time series y(1), y(2),..., y(N). Each y(n) has T(n) samples and d dimensions. Note that time series can have variable lengths, i.e. different T(n) but they share the same dimensionality d.

Such a data set is contained in a numpy array of shape (N, T, d) where T:=maxn T(n). Time series with less than T samples are padded at the end with numpy.nan. In addition, the extra padding is masked using numpy's MaskedArray.

from loadmydata.padding import get_signal_shape

# Assume that X contains a time series data set of shape (N, T, d)
for signal in X:
    # signal is a masked array of shape (T, d).
    # The true number of samples of the signal (without extra padding)
    # can be accessed with `get_signal_shape`.
    n_samples, n_dims = get_signal_shape(signal)
    # To get the signal without the extra padding, do
    signal_without_padding = signal[:n_samples]
    # do something with signal_without_padding
    ...

UEA/UCR time series classification repository

The UEA/UCR repository focuses on time series classification. As a result, each signal is associated with a label to predict.

The list of available data sets from the UEA/UCR repository is available here.

Usage example

from loadmydata.load_uea_ucr import load_uea_ucr_data

dataset_name = "ArrowHead"  # "AbnormalHeartbeat", "ACSF1", etc.
data = load_uea_ucr_data(dataset_name)

print(data.description)
print(data.X_train.shape)
print(data.X_test.shape)
print(data.y_train.shape)
print(data.y_test.shape)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loadmydata-0.0.7rc1.tar.gz (12.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page