Skip to main content

The Convolutional Shapelet Transform algorithm

Project description

This repository contains the implementation of the Convolutional Shapelet Transform (CST), a state-of-the-art shapelet algorithm.

It compute a set of convolutional shapelets that match small parts of the input space with highly discriminative points in multiple convolutional spaces.

Installation

The package support Python 3.7 & 3.8. You can install the package and its dependencies via pip using pip install convst. To install the package from sources you can download the latest release on github and run python setup.py install. This will install the package and automaticaly look for the dependencies using pip.

We recommend doing this in a new virtual environment using anaconda to avoid any conflict with an existing installation. If you wish to install dependencies individually, you can the strict dependencies used in the requierements.txt file.

An optional dependency that can help speed up numba, which is used in our implementation is the Intel vector math library (SVML). When using conda it can be installed by running conda install -c numba icc_rt

Requiered packages do not include packages not related to CST, the following packages could be useful if you want to run some other scripts in the archive:

  1. wildboard used for ShapeletForestClassifier

  2. networkx used to generate critical difference diagrams

Tutorial

We give here a minimal example to run the CST algorithm on any univariate dataset of the UCR archive:

import numpy as np

from sklearn.pipeline import make_pipeline

from sklearn.linear_model import RidgeClassifierCV

from sklearn.metrics import accuracy_score

from convst.shapelet_transforms.convolutional_ST import ConvolutionalShapeletTransformer

from convst.utils.dataset_utils import load_sktime_dataset_split



# Load Dataset by name. Any name of the univariate UCR archive can work.

X_train, X_test, y_train, y_test, _ = load_sktime_dataset_split(

    'GunPoint', normalize=True)



# First run will be slow due to numba compilations on the first call. Run small dataset like GunPoint the first time !

# Put verbose = 1 to see the progression of the algorithm.



cst = make_pipeline(

    ConvolutionalShapeletTransformer(verbose=0),

    RidgeClassifierCV(alphas=np.logspace(-6, 6, 20),

                      normalize=True, class_weight='balanced')

)



cst.fit(X_train, y_train)

pred = cst.predict(X_test)



print("Accuracy Score for CST : {}".format(accuracy_score(y_test, pred)))

We use the standard scikit-learn interface and expect as input a 3D matrix of shape (n_samples, n_features, n_timestamps). Note that as only univariate is supported for now, CST will only process the first feature.

In the Example folder, you can find some other scripts to help you get started and show you how to plot some results. The UCR_example.py script allows you to run CST on any UCR dataset and plot interpretations of the results.

Additional experiments mentioned in the paper are also found in this folder.

Current Work in Progress

The package currently has some limitations that are being worked on, the mains ones being :

  1. Adaptation to the multivariate context. While you can feed a multivariate time series to CST, it will only look at the first feature for now.

  2. Adaptation to irregular frequencies. This one will take a bit more time to think through, but is on the list.

  3. Adaptation to unsupervised context. The ideal being to implement a clustering version of the algortihm using scikit-learn standards.

  4. Possibility to change the model used to extract partitions of the data in CST.

  5. Parallel implementation of the remaining sequential parts of CST and global optimizations to speed-up CST.

  6. Memory consumption optimization relative to input time series characteristics.

  7. Use of more diverse set of features extracted from the convolutions, notably those from Catch-22.

  8. Redisgn interpretability tool to be more resilient to context (supervised or not) and high number of "class", currently graphs are really messy with high number of classes.

  9. Special case testing show a potential issue when class difference can only be made by value at a particular timepoint (with noise), a fix is in progress.

Reproducing the paper results

Multiple scripts are available under the PaperScripts folder. It contains the exact same scripts used to generate our results.

To obtain the same resampling data as the UCR archive, you muse use the tsml java repository, then from the class DataHandling in the package examples, use the function resamplingData and change where to read and write the data from. The function assumes the input is in arff format, they can be obtained on the time serie classification website

Contributing, Citing and Contact

If you are experiencing bugs in the CST implementation, or would like to contribute in any way, please create an issue or pull request in this repository

For other question or to take contact with me, you can email me at XXXX (institutional email might change soon so i provide this as a temporary address)

If you use our algorithm or publication in any work, please cite the following paper :

@article{CST,

  title={Convolutional Shapelet Transform: A new approach for time series shapelets},

  author={Guillaume Antoine, Vrain Christel, Elloumi Wael},

  journal={},

  volume={},

  number={},

  pages={},

  year={2021}

  publisher={}

}

Citations

Here are the code-related citations that were not made in the paper

[1]: Loning, Markus and Bagnall, Anthony and Ganesh, Sajaysurya and Kazakov, Viktor and Lines, Jason and Kiraly, Franz J, "sktime: A Unified Interface for Machine Learning with Time Series", Workshop on Systems for ML at NeurIPS 2019}

[2]: The Scikit-learn development team, "Scikit-learn: Machine Learning in Python", Journal of Machine Learning Research 2011

[3]: The Numpy development team, "Array programming with NumPy", Nature 2020

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convst-0.1.2.tar.gz (22.0 kB view hashes)

Uploaded Source

Built Distribution

convst-0.1.2-py3-none-any.whl (23.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page