No project description provided
Project description
Gaussian processes (GPs) are flexible distributions to model functional data. Whilst theoretically appealing, they are computationally cumbersome except for small datasets. This package implements two methods for scaling GP inference in Stan:
a
sparse-approximation
of the likelihood that is generally applicable.an exact method for regularly spaced data modeled by stationary kernels using fast
Fourier-methods
.
The implementation follows Stan’s design and exposes performant inference through a familiar interface.
Getting Started
The library is loaded with Stan’s #include
statement, and methods to evaluate or approximate the likelihood of a GP use the declarative ~
sampling syntax. The following brief example uses Fourier-methods
to sample GP realizations.
You can learn more by following the docs/examples
or delving into the docs/interface
. The docs/background
section offers a deeper explanation of the methods used to evaluate likelihoods and the pros and cons of different parameterizations. See the accompanying publication “Scalable Gaussian process inference with Stan” for further details.
Installation
If you have a recent python installation, the library can be installed by running
pip install gptools-stan
from the command line. The library exposes a function gptools.stan.compile_model
for compiling cmdstanpy.CmdStanModel
s with the correct include paths. For example, the example above can be compiled using the following snippet.
>>> from gptools.stan import compile_model
>>>
>>> # stan_file = path/to/getting_started.stan
>>> model = compile_model(stan_file=stan_file)
>>> model.name
'getting_started'
If you use cmdstanr or another Stan interface, you can download the library files from GitHub. Then add the library location to the compiler include_paths
as described in the manual (see here for cmdstanr instructions).
Reproducing results from the accompanying publication
The accompanying publication “Scalable Gaussian process inference with Stan” provides theoretical background and a technical description of the methods. All results and figures can be reproduced using one of the approaches below.
Docker runtime
Docker can run software in isolated containers. If you have docker installed, you can reproduce the results by running
docker run --rm tillahoffmann/gptools -v /path/to/output/directory:/workspace doit --db-file=/workspace/.doit.db results:stan
This command will download a prebuilt docker image and execute the steps required to generate all figures in the publication. Results will be placed in the specified output directory; make sure the directory exists before executing the command and that the specified path is an absolute, e.g., /path/to/...
instead of ../path/to/...
. You do not need to install any other software or download the source code. Intermediate results are cached if the process is interrupted, and the process can pick up where it left off when invoked using the same command. Your timing results are likely to differ from the results reported in the publication because runtimes vary substantially between different machines. All results reported in the manuscript were obtained on a 2020 Macbook Pro with M1 Apple Silicon chip and 16 GB of memory. Cross-architecture images are built following this guide.
Reproducing the results can be time intensive, especially to generate the data for the profiling figure. You can add the -e CI=1
flag to the above command. This will reduce the number of samples drawn per run and decrease timeouts to speed up the process although at the cost of more noisy and possibly incomplete figures.
If you would rather build the docker image from scratch, run docker build -t my-image-name .
from the root directory of this repository. You can then reproduce the results using the command above, replacing tillahoffmann/gptools
with my-image-name
. Optionally, run docker run --rm gptools doit tests:stan
to ensure the image runs as expected; this takes about ten to fifteen minutes on a Macbook.
Local runtime
You can reproduce the results using your local computing environment (rather than an isolated container runtime) as follows.
Ensure a recent python version (3.8 or later) is installed. The code was tested with python 3.8-3.11 on Ubuntu 22.04.2 and python 3.10 on macOS 13.2.1.
Install all dependencies by running
pip install -r dev_requirements.txt
.Install
cmdstan
by runninginstall_cmdstan --version=2.31.0
.Optionally, run
doit tests:stan
to test the installation.Optionally, launch a Jupyter notebook and open the
docs/logistic_regression/logistic_regression
notebook to get familiar with the package. The notebook illustrates fitting a univariate latent Gaussian process to binary observations. The MyST markdown notebook is located atdocs/logistic_regression/logistic_regression.md
in this repository.Run the command
doit results:stan
to reproduce the results.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gptools-stan-0.1.5.tar.gz
.
File metadata
- Download URL: gptools-stan-0.1.5.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd095e0027637c13d71261f96e5a4519e3d56b89b494b1beb82f8cb1689be8c1 |
|
MD5 | d64b0e8f98ba2aff48d89fc13707ddca |
|
BLAKE2b-256 | 1a6e3c716865fdd674072c949b95d18da517b99aec7b64dae495ec9adb84f30e |