Skip to main content

GPie: Gaussian Process tiny explorer

Project description

GPie

Language Python PyPI License Last Commit

Gaussian Process tiny explorer

  • simple: an intuitive syntax inspired by scikit-learn
  • powerful: a compact core of expressive abstractions
  • extensible: a modular design for effortless composition
  • lightweight: a minimal set of dependencies {standard library, numpy, scipy}

This is a ongoing research project with many parts currently under construction - please expect bugs and sharp edges.

Features

  • several "avant-garde" kernels such as spectral kernel and neural kernel allow for exploration of new ideas
  • each kernel implements anisotropic variant besides isotropic one to support automatic relevance determination
  • a full-fledged toolkit of kernel operators enables all sorts of "kernel engineering", for example, handcrafting composite kernels based on expert knowledge or exploiting special structure of datasets
  • core computations such as likelihood and gradient are carefully formulated for speed and stability
  • sampling inference embraces a probabilistic perspective in learning and prediction to promote robustness
  • Bayesian optimizer offers a principled strategy to optimize expensive and black-box objectives globally

Functionality

  • kernel functions
    • white kernel
    • constant kernel
    • radial basis function kernel
    • rational quadratic kernel
    • Matérn kernel
      • Ornstein-Uhlenbeck kernel
    • periodic kernel
    • spectral kernel
    • neural kernel
  • kernel operators
    • Hadamard (element-wise)
      • sum
      • product
      • exponentiation
    • Kronecker
      • sum
      • product
  • Gaussian process
    • regression
    • classification
  • t process
    • regression
    • classification
  • Bayesian optimizer
    • surrogate: Gaussian process, t process
    • acquisition: PI, EI, LCB, ES, KG
  • sampling inference
    • Markov chain Monte Carlo
      • Metropolis-Hastings
      • Hamiltonian + no-U-turn
    • simulated annealing
  • variational inference

Note: parts of the project in italic font are under construction.

Examples

Gaussian process regression on Mauna Loa CO2

In this example, we use Gaussian process to model the concentration of CO2 at Mauna Loa as a function of time.

# handcraft a composite kernel based on expert knowledge
# long-term trend
k1 = 30.0**2 * RBFKernel(l=200.0)
# seasonal variations
k2 = 3.0**2 * RBFKernel(l=200.0) * PeriodicKernel(p=1.0, l=1.0)
# medium-term irregularities
k3 = 0.5**2 * RationalQuadraticKernel(m=0.8, l=1.0)
# noise
k4 = 0.1**2 * RBFKernel(l=0.1) + 0.2**2 * WhiteKernel()
# composite kernel
kernel = k1 + k2 + k3 + k4
# train GPR on data
gpr = GaussianProcessRegressor(kernel=kernel)
gpr.fit(X, y)

alt text In the plot, scattered dots represent historical observations, and shaded area shows the predictive interval (μ - σ, μ + σ) prophesied by a Gaussian process regressor trained on the historical data.

Sampling inference for Gaussian process regression

Here we use a synthesized dataset for ease of illustration and investigate sampling inference techniques such as Markov chain Monte Carlo. As a Gaussian process defines the predictive distribution, we can get a sense of it by sampling from its prior distribution (before seeing training set) and posterior distribution (after seeing training set).

# with the current hyperparameter configuration,
# ... what is the prior distribution p(y_test)
y_prior = gpr.prior_predictive(X, n_samples=6)
# ... what is the posterior distribution p(y_test|y_train)
y_posterior = gpr.posterior_predictive(X, n_samples=4)

alt text alt text

We can also sample from the posterior distribution of a hyperparameter, which characterizes its uncertainty beyond a single point estimate such as MLE or MAP.

# invoke MCMC sampler to sample hyper values from its posterior distribution
hyper_posterior = gpr.hyper_posterior(n_samples=10000)

alt text

Bayesian optimization

We demonstrate a simple example of Bayesian optimization. It starts by exploring the objective function globally and shifts to exploiting "promising areas" as more observations are made.

# invoke MCMC sampler to sample hyper values from its posterior distribution
hyper_posterior = gpr.hyper_posterior(n_samples=10000)

alt text

Backend

GPie makes extensive use of de facto standard scientific computing packages in Python:

  • numpy: linear algebra, stochastic sampling
  • scipy: gradient-based optimization, stochastic sampling

Installation

GPie requires Python 3.6 or greater. The easiest way to install GPie is from a prebuilt wheel using pip:

pip install --upgrade gpie

You can also install from source to try out the latest features (requires pep517>=0.8.0 and setuptools>=40.9.0):

pip install --upgrade git+https://github.com/zackxzhang/gpie

What's Next

  • implement Hamiltonian Monte Carlo and no-U-turn for more efficient sampling (working)
  • a brief guide on varying characteristics of different kernels and how to compose them (queued)
  • a demo of quantified Occam's razor encoded by Bayesian inference and its implication for model selection (queued)
  • implement Kronecker operators for scalable learning on grid data (researching)
  • replace Cholesky decomposition-based exact inference with Krylov subspace methods like conjugate gradient and Lanczos tridiagonalization for greater speed (researching)
  • ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpie-0.2.1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

gpie-0.2.1-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file gpie-0.2.1.tar.gz.

File metadata

  • Download URL: gpie-0.2.1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3

File hashes

Hashes for gpie-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8e01fabe64f8ab9b3cea30ea7b4d087784b8ea00bf189dfbfb4cdf967036aae5
MD5 d0c4bf4185bee9915fad54f577a12cc0
BLAKE2b-256 3b7c49fe02071470e837ef48975dbbbe0afe29b0b9d976726f36d22ade096a80

See more details on using hashes here.

File details

Details for the file gpie-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: gpie-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0.post20201207 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.3

File hashes

Hashes for gpie-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a52fff0464eb3fe867ef9544f4f650bec7c51e30a686173ca0ddc13699e673a3
MD5 8e355060a93f511e82c7e6256d4ee713
BLAKE2b-256 c751040a0077a97b186f6b158553a986c5e4ffe43ce9d91a165a635ddc1a56cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page