Skip to main content

Multi-Output Gaussian Process ToolKit

Project description

Multi-Output Gaussian Process Toolkit

Paper - API Documentation - Tutorials & Examples

The Multi-Output Gaussian Process Toolkit is a Python toolkit for multichannel time series analysis. MOGPTK implements multioutput Gaussian process models with different covariance architectures, alongside pre-processing stages based on spectral analysis and visualisation tools. It supports GPU acceleration based on PyTorch to provide a computationally efficient way to model training. The authors of the toolkit are Taco de Wolff, Alejandro Cuevas, and Felipe Tobar, with contributions from the community of users.

The MOGPTK project started in 2020, hosted by the Center for Mathematical Modelling at the University of Chile. Since October 2024, the project has been hosted at Imperial College London.

Throughout its development, the project has been funded by:

  • Center for Mathematical Modelling, Universidad de Chile (2020-2024)
  • Fondecyt grants from the National Agency for Research and Development, Chile (2020-2024)
  • Different awards from Google Research (2020-2024)

Installation

With Anaconda installed on your system, open a command prompt and create a virtual environment:

conda create -n myenv python=3.7
conda activate myenv

where myenv is the name of your environment, and where the version of Python could be 3.6 or above. Next we will install this toolkit and automatically install the necessary dependencies such as PyTorch.

pip install mogptk

In order to upgrade to a new version of MOGPTK or any of its dependencies, use --upgrade as follows:

pip install --upgrade mogptk

For developers of the library or for users who need the latest changes, we recommend cloning the git master or develop branch and to use the following command inside the repository folder:

pip install --upgrade -e .

See Tutorials & Examples to get started.

Introduction

This repository provides a toolkit to perform multi-output GP regression with kernels that are designed to utilize correlation information among channels in order to better model signals. The toolkit is mainly targeted to time-series, and includes plotting functions for the case of single input with multiple outputs (time series with several channels).

The main kernel corresponds to Multi Output Spectral Mixture Kernel, which correlates every pair of data points (irrespective of their channel of origin) to model the signals. This kernel is specified in detail in the following publication: G. Parra, F. Tobar, Spectral Mixture Kernels for Multi-Output Gaussian Processes, Advances in Neural Information Processing Systems, 2017. Proceedings link: https://papers.nips.cc/paper/7245-spectral-mixture-kernels-for-multi-output-gaussian-processes

The kernel learns the cross-channel correlations of the data, so it is particularly well-suited for the task of signal reconstruction in the event of sporadic data loss. All other included kernels can be derived from the Multi Output Spectral Mixture kernel by restricting some parameters or applying some transformations.

One of the main advantages of the present toolkit is the GPU support, which enables the user to train models through PyTorch, speeding computations significantly. It also includes sparse-variational GP regression functionality to decrease computation time even further.

See MOGPTK: The Multi-Output Gaussian Process Toolkit for our publication in Neurocomputing.

Implementation

Implemented models:

  • Exact
  • Snelson (E. Snelson, Z. Ghahramani, "Sparse Gaussian Processes using Pseudo-inputs", 2005)
  • OpperArchambeau (M. Opper, C. Archambeau, "The Variational Gaussian Approximation Revisited", 2009)
  • Titsias (Titsias, "Variational learning of induced variables in sparse Gaussian processes", 2009)
  • Hensman (J. Hensman, et al., "Scalable Variational Gaussian Process Classification", 2015)

Implemented likelihoods:

  • Gaussian
  • Student-T
  • Exponential
  • Laplace
  • Bernoulli
  • Beta
  • Gamma
  • Poisson
  • Weibull
  • Log-Logistic
  • Log-Gaussian
  • Chi
  • Chi-Squared

Tutorials

00 - Quick Start: Short notebook showing the basic use of the toolkit.

01 - Data Loading: Functionality to load CSVs and DataFrames while using formatters for dates.

02 - Data Preparation: Handle data, removing observations to simulate sensor failure and apply tranformations to the data.

03 - Parameter Initialization: Parameter initialization using different methods, for single output regression using spectral mixture kernel and multioutput case using MOSM kernel.

04 - Model Training: Training of models while keeping certain parameters fixed.

05 - Error Metrics Obtain different metrics in order to compare models.

06 - Custom Kernels and Mean Functions Use or create custom kernels as well as training custom mean functions.

07 - Sparse Multi Input Use 8 input dimensions to train the Abalone data set using sparse GPs.

08 - Multi Likelihood Classification Use a different likelihood for each channel, one Bernoulli for classification and one StudentT's for regression.

Examples

Airline passengers: Regression using a single output spectral mixture on the yearly number of passengers of an airline.

Seasonal CO2 of Mauna-Loa: Regression using a single output spectral mixture on the CO2 concentration at Mauna-Loa throughout many years.

Currency Exchange: Model training, interpretation and comparison on a dataset of 11 currency exchange rates (against the dollar) from 2017 and 2018. These 11 channels are fitted with the MOSM, SM-LMC, CSM, and CONV kernels and their results are compared and interpreted.

Gold, Oil, NASDAQ, USD-index: The commodity indices for gold and oil, together with the indices for the NASDAQ and the USD against a basket of other currencies, we train multiple models to find correlations between the macro economic indicators.

Human Activity Recognition: Using the Inertial Measurement Unit (IMU) of an Apple iPhone 4, the accelerometer, gyroscope and magnetometer 3D data were recorded for different activities resulting in nine channels.

Bramblemet tidal waves: Tidal wave data set of four locations in the south of England. We model the tidal wave periods of approximately 12.5 hours using different multi-output Gaussian processes.

Documentation

See the API documentation for documentation of our toolkit, including usage and examples of functions and classes.

Authors

  • Taco de Wolff
  • Alejandro Cuevas
  • Felipe Tobar

Users

This is a list of users of this toolbox, feel free to add your project!

Contributing

We accept and encourage contributions to the toolkit in the form of pull requests (PRs), bug reports and discussions (GitHub issues). It is adviced to start an open discussion before proposing large PRs. For small PRs we suggest that they address only one issue or add one new feature. All PRs should keep documentation and notebooks up to date.

Citing

Please use our publication at arXiv to cite our toolkit: MOGPTK: The Multi-Output Gaussian Process Toolkit. We recommend the following BibTeX entry:

@article{mogptk,
    author = {T. {de Wolff} and A. {Cuevas} and F. {Tobar}},
    title = {{MOGPTK: The Multi-Output Gaussian Process Toolkit}},
    journal = "Neurocomputing",
    year = "2020",
    issn = "0925-2312",
    doi = "https://doi.org/10.1016/j.neucom.2020.09.085",
    url = "https://github.com/GAMES-UChile/mogptk"
}

Citations

Used in code

License

Released under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mogptk-0.5.3.tar.gz (84.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mogptk-0.5.3-py3-none-any.whl (131.2 kB view details)

Uploaded Python 3

File details

Details for the file mogptk-0.5.3.tar.gz.

File metadata

  • Download URL: mogptk-0.5.3.tar.gz
  • Upload date:
  • Size: 84.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for mogptk-0.5.3.tar.gz
Algorithm Hash digest
SHA256 8f7d27f7d19081dbc1360088eb486b6ffa348905338c12ae27cf8d939a1e9b47
MD5 2bc01fbcbc32423261c78bbc1f303aa1
BLAKE2b-256 7a2e67a30e3e16e7462d96bb4f201713bd0519820b62a309d2f11da6408d734b

See more details on using hashes here.

File details

Details for the file mogptk-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: mogptk-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 131.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for mogptk-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e68964f9400aa99c3764036ec29ccf170c17aa8bbed37aee0fad3ee984355fe7
MD5 f0a8f9d8807c4bbc05e33de1e6d58a96
BLAKE2b-256 e0f99f5bcdf9a78942e4a6c2a67b873de2c7a2d6062b7c40c6eeda008c8cc9a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page