Skip to main content

No project description provided

Project description

https://github.com/onnela-lab/gptools/actions/workflows/main.yml/badge.svg https://readthedocs.org/projects/gptools-stan/badge/?version=latest https://img.shields.io/pypi/v/gptools-stan https://img.shields.io/static/v1?label=&message=GitHub&color=gray&logo=github

Gaussian processes (GPs) are flexible distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. This package implements two methods for scaling GP inference in Stan:

  1. a sparse-approximation of the likelihood that is generally applicable.

  2. an exact method for regularly spaced data modeled by stationary kernels using fast Fourier-methods.

The implementation follows Stan’s design and exposes performant inference through a familiar interface.

Getting Started

The library is loaded with Stan’s #include statement, and methods to evaluate or approximate the likelihood of a GP use the declarative ~ sampling syntax. The following brief example uses Fourier-methods to sample GP realizations.

You can learn more by following the docs/examples or delving into the docs/interface. The docs/background section offers a deeper explanation of the methods used to evaluate likelihoods and the pros and cons of different parameterizations. See the accompanying publication “Scalable Gaussian process inference with Stan” for further details.

Installation

If you have a recent python installation, the library can be installed by running

pip install gptools-stan

from the command line. The library exposes a function gptools.stan.compile_model for compiling cmdstanpy.CmdStanModels with the correct include paths. For example, the example above can be compiled using the following snippet.

>>> from gptools.stan import compile_model
>>>
>>> # stan_file = path/to/getting_started.stan
>>> model = compile_model(stan_file=stan_file)
>>> model.name
'getting_started'

If you use cmdstanr or another Stan interface, you can download the library files from GitHub. Then add the library location to the compiler include_paths as described in the manual (see here for cmdstanr instructions).

Reproducing results from the accompanying publication

The accompanying publication “Scalable Gaussian process inference with Stan” provides theoretical background and a technical description of the methods. All results and figures can be reproduced using one of the approaches below.

Docker runtime

Docker can run software in isolated containers. If you have docker installed, you can reproduce the results by running

docker run --rm tillahoffmann/gptools -v /path/to/output/directory:/workspace doit --db-file=/workspace/.doit.db results:stan

This command will download a prebuilt docker image and execute the steps required to generate all figures in the publication. Results will be placed in the specified output directory; make sure the directory exists before executing the command and that the specified path is an absolute, e.g., /path/to/... instead of ../path/to/.... You do not need to install any other software or download the source code. Intermediate results are cached if the process is interrupted, and the process can pick up where it left off when invoked using the same command. Your timing results are likely to differ from the results reported in the publication because runtimes vary substantially between different machines. All results reported in the manuscript were obtained on a 2020 Macbook Pro with M1 Apple Silicon chip and 16 GB of memory. Cross-architecture images are built following this guide.

Reproducing the results can be time intensive, especially to generate the data for the profiling figure. You can add the -e CI=1 flag to the above command. This will reduce the number of samples drawn per run and decrease timeouts to speed up the process although at the cost of more noisy and possibly incomplete figures.

If you would rather build the docker image from scratch, run docker build -t my-image-name . from the root directory of this repository. You can then reproduce the results using the command above, replacing tillahoffmann/gptools with my-image-name. Optionally, run docker run --rm gptools doit tests:stan to ensure the image runs as expected; this takes about ten to fifteen minutes on a Macbook.

Local runtime

You can reproduce the results using your local computing environment (rather than an isolated container runtime) as follows.

  1. Ensure a recent python version (3.8 or later) is installed. The code was tested with python 3.8-3.11 on Ubuntu 22.04.2 and python 3.10 on macOS 13.2.1.

  2. Install all dependencies by running pip install -r dev_requirements.txt.

  3. Install cmdstan by running install_cmdstan --version=2.31.0.

  4. Optionally, run doit tests:stan to test the installation.

  5. Optionally, launch a Jupyter notebook and open the docs/logistic_regression/logistic_regression notebook to get familiar with the package. The notebook illustrates fitting a univariate latent Gaussian process to binary observations. The MyST markdown notebook is located at docs/logistic_regression/logistic_regression.md in this repository.

  6. Run the command doit results:stan to reproduce the results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gptools-stan-0.1.5.tar.gz (20.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page