Skip to main content

A practitioner's toolbox for estimating large-scale Gaussian Process models with PyMC and PyTensor

Project description

PTGP

A Gaussian process library for building GP models that solve real-world problems.

Who this is for

PTGP is for practitioners who need flexible, well-supported GP modeling. The goal of PTGP is to be fully batteries-included and ready to work on real-world problems:

  • Practical GP algorithms: exact GP, VFE with collapsed bound, SVGP with minibatch training, VFF (Variational Fourier Features)
  • Full kernel library: ExpQuad, Matern52/32/12, RandomWalk, Gibbs, WarpedInput, categorical kernels for multi-class or categorical input variables, composition via + and *, active_dims for dimension selection
  • Non-Gaussian likelihoods: Bernoulli, Poisson, NegativeBinomial, StudentT
  • PyMC priors: set priors on any hyperparameter; use PyMC distributions for mean functions and noise models; MAP training by default
  • Training tools: L-BFGS-B and Adam optimizers, per-parameter learning rates, staged optimization, frozen variables, inducing point initialization strategies, diagnostic-guided workflows; more are being added, such as carefully monitored training to help diagnose issues early
  • Agent-readable docs: docs/agents/ ships LLM-readable guides for debugging training issues and folk wisdom (VFE training covered). See the Working with AI coding assistants section below.
  • More coming: see the issues

Researchers benefit from the underlying design: PTGP is built on PyTensor's symbolic graph and rewrite system, so you write GP math directly (pt.linalg.inv(K), pt.linalg.slogdet(K)) and the compiler chooses efficient algorithms based on declared matrix structure. This makes it straightforward to implement new GP approximations and create custom models, and will eventually allow matrix structure like Kronecker, Toeplitz, and sparse to be taken advantage of automatically.

Models

Model Scale Best for
gp.Unapproximated N < ~2,000 Exact inference, model comparison
gp.VFE N < ~50,000 Medium-scale data with inducing points
gp.SVGP N up to ~500,000 Large data, non-Gaussian likelihoods, minibatch training
FourierFeatures1D 1D Matern kernels Structured Kuu via Fourier basis; no inducing point placement

Quick start

import numpy as np
import pymc as pm
import pytensor.tensor as pt
import ptgp as pg

X = np.random.randn(200, 1)
y = np.sin(X.ravel()) + 0.1 * np.random.randn(200)
Z_init = np.linspace(-2, 2, 20)[:, None]
Z_var = pt.matrix("Z", shape=(20, 1))

with pm.Model() as model:
    ls = pm.InverseGamma("ls", alpha=2.0, beta=1.0)
    eta = pm.Exponential("eta", lam=1.0)
    kernel = eta**2 * pg.kernels.Matern52(input_dim=1, ls=ls)

    svgp = pg.gp.SVGP(
        kernel=kernel,
        likelihood=pg.likelihoods.Gaussian(sigma=0.1),
        inducing_variable=pg.inducing.Points(Z_var, Z_init=Z_init),
        variational_params=pg.gp.init_variational_params(M=20),
    )
    fit = pg.fit(svgp, X, y, method="L-BFGS-B")

mean, var = pg.predict(svgp, np.linspace(-3, 3, 100)[:, None], fit)

pg.fit picks a default objective from the gp type (Unapproximatedmarginal_log_likelihood, VFEcollapsed_elbo, SVGPelbo) and returns a FitResult that pg.predict consumes. For stochastic mini-batch training, staged VFE, or per-group learning rates, drop down to pg.optim.compile_training_step / pg.optim.compile_scipy_objective — see notebooks/demo.ipynb:

X_var = pt.matrix("X")
y_var = pt.vector("y")

step, shared_params, shared_extras = pg.optim.compile_training_step(
    pg.objectives.elbo, svgp, X_var, y_var, model, learning_rate=1e-2
)

for i in range(500):
    loss = step(X, y)

predict_fn = pg.optim.compile_predict(
    svgp, pt.matrix("X_new"), model, shared_params, shared_extras=shared_extras
)
mean, var = predict_fn(np.linspace(-3, 3, 100)[:, None])

Training uses MAP by default: the PyMC log-prior is added to the objective. Pass include_prior=False for pure ELBO. For exact GPs and VFE, use compile_scipy_objective with L-BFGS-B instead. See notebooks/demo.ipynb for end-to-end examples covering all three models.

How it works

PTGP is built on PyTensor's symbolic graph. Kernels, likelihoods, and GP models return symbolic tensors with naive linear algebra like pt.linalg.inv(K) that PyTensor's rewrite system automatically lowers to efficient Cholesky-based code using declared matrix properties. All models compile their full forward+gradient step down to the minimum number of cubic factorizations.

PTGP tries to distill some of the approaches of existing GP libraries and make them more accessible, mainly GPJax, GPflow, and GPyTorch.

Working with AI coding assistants

PTGP is set up to work nicely with AI coding assistants:

  • AGENTS.md — project-level instructions for AI coding assistants (architecture, conventions, where things live, how to run tests). Follows the AGENTS.md cross-tool convention used by Codex, Cursor, Aider, and others.
  • docs/agents/ — backend-agnostic agent-skill docs covering folk wisdom and training-debug recipes. Currently includes ptgp-vfe (VFE diagnostic skill: pitfalls, escalation workflow, interpretation of VFEDiagnostics and GreedyVarianceDiagnostics).

Claude Code users

Claude Code reads CLAUDE.md, not AGENTS.md. Symlink so they stay in sync:

ln -s AGENTS.md CLAUDE.md

To install the VFE skill into a Claude Code skill directory (so Claude auto-discovers it when you mention VFE), run:

python scripts/install_claude_skills.py --project .   # ./.claude/skills/
python scripts/install_claude_skills.py --user        # ~/.claude/skills/

Install

pip install git+https://github.com/pymc-devs/ptgp.git

To hack on PTGP itself, clone and install in editable mode:

git clone https://github.com/pymc-devs/ptgp.git
cd ptgp
pip install -e .

Contributing

See the issues for what's being worked on. Feel free to propose issues, feature requests, or use cases you've been hoping could be made easier. PRs always welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptgp-0.1.0.tar.gz (140.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ptgp-0.1.0-py3-none-any.whl (71.6 kB view details)

Uploaded Python 3

File details

Details for the file ptgp-0.1.0.tar.gz.

File metadata

  • Download URL: ptgp-0.1.0.tar.gz
  • Upload date:
  • Size: 140.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ptgp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6e813bf1e5d6781e50f8d328981279591438b772f5c20d4b1ee55b038356e504
MD5 c5def6e8862fc61add4547b4c87ab152
BLAKE2b-256 03ad04c3891fbde91b6f6fa8f299b88269a487e6c20eb11bf3d8770594c30ffb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptgp-0.1.0.tar.gz:

Publisher: pypi.yml on pymc-devs/ptgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ptgp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ptgp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 71.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ptgp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b107eda722d1f0d05f369eba87c090fadea9bd057894d8fd3424f362bec90e13
MD5 4ea6f44cc9efbca58054b1dadb6f2a46
BLAKE2b-256 8e135c4f4327f851ccb750c169aefdb1c01e7da06d11e3eab8732a3bdf92b730

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptgp-0.1.0-py3-none-any.whl:

Publisher: pypi.yml on pymc-devs/ptgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page