GPie: Gaussian Process tiny explorer
Project description
GPie
Gaussian Process tiny explorer
- simple: an intuitive syntax inspired by scikit-learn
- powerful: a compact core of expressive abstractions
- extensible: a modular design for effortless composition
- lightweight: a minimal set of dependencies {standard library, numpy, scipy}
This is a ongoing research project with many parts currently under construction - please expect bugs and sharp edges.
Features
- several "avant-garde" kernels such as spectral kernel and neural kernel allow for exploration of new ideas
- each kernel implements anisotropic variant besides isotropic one to support automatic relevance determination
- a full-fledged toolkit of kernel operators enables all sorts of "kernel engineering", for example, handcrafting composite kernels based on expert knowledge or exploiting special structure of datasets
- core computations such as likelihood and gradient are carefully formulated for speed and stability
- sampling inference embraces a probabilistic perspective in learning and prediction to promote robustness
- Bayesian optimizer offers a principled strategy to optimize expensive and black-box objectives globally
Functionality
- kernel functions
- white kernel
- constant kernel
- radial basis function kernel
- rational quadratic kernel
- Matérn kernel
- Ornstein-Uhlenbeck kernel
- periodic kernel
- spectral kernel
- neural kernel
- kernel operators
- Hadamard (element-wise)
- sum
- product
- exponentiation
- Kronecker
- sum
- product
- Hadamard (element-wise)
- Gaussian process
- regression
- classification
- t process
- regression
- classification
- Bayesian optimizer
- surrogate: Gaussian process, t process
- acquisition: PI, EI, LCB, ES, KG
- sampling inference
- Markov chain Monte Carlo
- Metropolis-Hastings
- Hamiltonian + no-U-turn
- simulated annealing
- Markov chain Monte Carlo
- variational inference
Note: parts of the project in italic font are under construction.
Examples
Gaussian process regression on Mauna Loa CO2
In this example, we use Gaussian process to model the concentration of CO2 at Mauna Loa as a function of time.
# handcraft a composite kernel based on expert knowledge
# long-term trend
k1 = 30.0**2 * RBFKernel(l=200.0)
# seasonal variations
k2 = 3.0**2 * RBFKernel(l=200.0) * PeriodicKernel(p=1.0, l=1.0)
# medium-term irregularities
k3 = 0.5**2 * RationalQuadraticKernel(m=0.8, l=1.0)
# noise
k4 = 0.1**2 * RBFKernel(l=0.1) + 0.2**2 * WhiteKernel()
# composite kernel
kernel = k1 + k2 + k3 + k4
# train GPR on data
gpr = GaussianProcessRegressor(kernel=kernel)
gpr.fit(X, y)
In the plot, scattered dots represent historical observations, and shaded area shows the predictive interval (μ - σ, μ + σ) prophesied by a Gaussian process regressor trained on the historical data.
Sampling inference for Gaussian process regression
Here we use a synthesized dataset for ease of illustration and investigate sampling inference techniques such as Markov chain Monte Carlo. As a Gaussian process defines the predictive distribution, we can get a sense of it by sampling from its prior distribution (before seeing training set) and posterior distribution (after seeing training set).
# with the current hyperparameter configuration,
# ... what is the prior distribution p(y_test)
y_prior = gpr.prior_predictive(X, n_samples=6)
# ... what is the posterior distribution p(y_test|y_train)
y_posterior = gpr.posterior_predictive(X, n_samples=4)
We can also sample from the posterior distribution of a hyperparameter, which characterizes its uncertainty beyond a single point estimate such as MLE or MAP.
# invoke MCMC sampler to sample hyper values from its posterior distribution
hyper_posterior = gpr.hyper_posterior(n_samples=10000)
Bayesian optimization
We demonstrate a simple example of Bayesian optimization. It starts by exploring the objective function globally and shifts to exploiting "promising areas" as more observations are made.
# invoke MCMC sampler to sample hyper values from its posterior distribution
hyper_posterior = gpr.hyper_posterior(n_samples=10000)
Backend
GPie makes extensive use of de facto standard scientific computing packages in Python:
- numpy: linear algebra, stochastic sampling
- scipy: gradient-based optimization, stochastic sampling
Installation
GPie requires Python 3.6 or greater. The easiest way to install GPie is from a prebuilt wheel using pip:
pip install --upgrade gpie
You can also install from source to try out the latest features (requires pep517>=0.8.0
and setuptools>=40.9.0
):
pip install --upgrade git+https://github.com/zackxzhang/gpie
What's Next
- implement Hamiltonian Monte Carlo and no-U-turn for more efficient sampling (working)
- a brief guide on varying characteristics of different kernels and how to compose them (queued)
- a demo of quantified Occam's razor encoded by Bayesian inference and its implication for model selection (queued)
- implement Kronecker operators for scalable learning on grid data (researching)
- replace Cholesky decomposition-based exact inference with Krylov subspace methods like conjugate gradient and Lanczos tridiagonalization for greater speed (researching)
- ...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gpie-0.2.1.tar.gz
.
File metadata
- Download URL: gpie-0.2.1.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e01fabe64f8ab9b3cea30ea7b4d087784b8ea00bf189dfbfb4cdf967036aae5 |
|
MD5 | d0c4bf4185bee9915fad54f577a12cc0 |
|
BLAKE2b-256 | 3b7c49fe02071470e837ef48975dbbbe0afe29b0b9d976726f36d22ade096a80 |
File details
Details for the file gpie-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: gpie-0.2.1-py3-none-any.whl
- Upload date:
- Size: 30.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a52fff0464eb3fe867ef9544f4f650bec7c51e30a686173ca0ddc13699e673a3 |
|
MD5 | 8e355060a93f511e82c7e6256d4ee713 |
|
BLAKE2b-256 | c751040a0077a97b186f6b158553a986c5e4ffe43ce9d91a165a635ddc1a56cf |