Skip to main content

Lightweight Gaussian Process inference in C++17 with Python bindings — Apple Metal / Accelerate on macOS and NVIDIA CUDA / OpenBLAS on Linux. A complement to GPyTorch for projects that need GP regression with a small dependency footprint or direct C++ embedding.

Project description

LightGP

Lightweight Gaussian Process inference in C++ with Python bindings. Apple Metal + Accelerate (AMX) on macOS; NVIDIA CUDA + OpenBLAS on Linux. NumPy-first Python API with no deep-learning framework dependency.

CI Docs arXiv License: MIT PyPI


Install

git clone https://github.com/Fangop/lightgp.git
cd lightgp/python
pip install -e ".[test]"

LightGP builds from source via scikit-build-core. On macOS-arm64 the build auto-detects Apple Accelerate and Metal; on Linux it auto-detects OpenBLAS / LAPACK and, when LIGHTGP_ENABLE_CUDA=1 is set, CUDA. A C++17 compiler is required.

Quick start

import numpy as np
import lightgp as gp

X = np.linspace(-3, 3, 100, dtype=np.float32).reshape(-1, 1)
y = np.sin(X[:, 0]).astype(np.float32) + 0.1 * np.random.randn(100).astype(np.float32)

model = gp.GPExact(gp.RBF())
model.fit(X, y)
model.optimize(steps=50)
pred = model.predict(X)             # → {'mean': (100,) float32, 'var': (100,) float32}

Kernel composition with Python operators:

kernel = gp.Scale(gp.RBF()) + gp.Scale(gp.Periodic(period=1.0))
model = gp.GPExact(kernel, mean=gp.LinearMean(input_dim=1), noise_var=0.01)
model.fit(X, y)
model.optimize(steps=200)

Sparse GP for large datasets:

model = gp.GPSparse(noise_var=0.1)
model.fit(X_big, y_big, num_inducing=200)  # scales to N=50000 in ~100 ms

Documentation

Full docs at https://fangop.github.io/lightgp/ — getting started, six tutorials, complete API reference, benchmarks gallery, theory pages, and a developer guide.

Features

  • Four inference paths — exact Cholesky, matrix-free conjugate gradients, sparse Titsias VFE, and SKI/KISS-GP with FFT.
  • Composable kernels — RBF, Matérn-{½, 3/2, 5/2}, Periodic, Linear, plus +, *, and Scale operators that build kernel trees with jointly optimisable hyperparameters.
  • Mean functions — Zero, Constant, Linear.
  • Apple Metal backend — native Metal Shading Language compute shaders, including a fused matrix-free $K\mathbf v$ kernel that keeps CG memory at O(N).
  • NVIDIA CUDA backend — cuBLAS sgemm, cuSOLVER spotrf, cuFFT-driven SKI, and custom kernels for kernel-matrix construction and matrix-free matvecs.
  • Tuned CPU paths — Apple Accelerate / AMX on macOS, OpenBLAS / LAPACK on Linux, auto-detected by the build script.
  • Backend::Auto picks CPU vs Metal vs CUDA based on N, D, and the requested solver — users don't have to think about hardware crossover points.
  • Pure-C++17 core — embeddable in iOS apps, robotics stacks, simulators, and game engines without bringing in a deep-learning framework.
  • Python bindings via pybind11scikit-build-core builds the right backend per platform from source (Metal on macOS-arm64, CUDA on Linux when LIGHTGP_ENABLE_CUDA=1) and exposes the full API with NumPy interop.

Benchmarks

End-to-end GP fit + predict against GPyTorch on identical hardware (fp32, D=4, median of 5 runs).

Apple M4 (10 CPU cores, 8 GPU cores, 16 GB unified memory)

Config LightGP CPU LightGP Metal GPyTorch CPU GPyTorch MPS LightGP best vs GPyTorch best
Exact RBF, N=2048 23.6 ms 195 ms 89 ms (gap*) 3.8× faster
Exact Matérn-5/2, N=2048 42 ms 191 ms 106 ms (gap*) 2.5× faster
Sparse RBF, N=10k, M=200 18.5 ms 42 ms 42 ms 69 ms 2.3× faster
Sparse RBF, N=50k, M=200 97.4 ms 156 ms 196 ms 98 ms 2.0× faster vs CPU; on par with MPS
Matrix-free $K\mathbf v$, N=20k n/a 22 ms n/a (no equiv) 32× over explicit

*GPyTorch MPS falls back to CPU for exact-GP variance because aten::_linalg_eigh.eigenvalues is not yet implemented on PyTorch's MPS backend — the gap is in PyTorch itself, not in GPyTorch.

NVIDIA RTX 3060 (12 GB VRAM, CUDA 12.0)

Config LightGP CUDA GPyTorch CUDA LightGP advantage
Exact RBF, N=512 2.0 ms 10.3 ms 5.2×
Exact RBF, N=1024 5.3 ms 35.4 ms 6.7×
Exact RBF, N=2048 28.0 ms 63.0 ms 2.3×
Exact RBF, N=4096 152 ms 111 ms 0.7× (GPyTorch wins)
Sparse RBF, N=1000, M=100 (warm) 0.9 ms 13.6 ms 15.3×
Sparse RBF, N=10k, M=200 (warm) 13.7 ms 23.9 ms 1.7×
Sparse RBF, N=50k, M=200 (warm) 75 ms 55 ms 0.7× (GPyTorch wins)
Matrix-free $K\mathbf v$, N=20k 9.8 ms (no equiv) unique to LightGP
Matrix-free $K\mathbf v$, N=100k 204 ms (no equiv) unique to LightGP
Cholesky, N=4096 (component) 37 ms n/a (not exposed) 136× over OpenBLAS

LightGP wins on 11 of 13 head-to-head Exact and Sparse configurations across both platforms — the gap comes from a direct C++ → BLAS call path versus the Python interpreter + PyTorch dispatcher + ATen operator registry that GPyTorch traverses on every kernel call. Both libraries hit the same underlying BLAS underneath. GPyTorch keeps the edge at large exact-GP sizes (N=4096) and large sparse VFE (N=50k) where its persistent device tensors and compiled autograd amortise the per-call overhead.

The matrix-free $K\mathbf v$ kernel is unique to LightGP on both Apple Silicon and NVIDIA: PyTorch doesn't yet expose user-defined Metal compute shaders, and the CUDA fusion would require building a custom op outside GPyTorch. It enables CG-based GP inference at N=100k+ with O(N) memory instead of O(N²) for the explicit kernel matrix.

The SKI / KISS-GP path with FFT runs a 500 000-point GP fit + predict in under 1 second on the RTX 3060 (and uses Accelerate vDSP for the equivalent path on Mac). Full numbers, including SKI, GEMM, Cholesky, and GPyTorch comparisons across more sizes, are in the benchmarks gallery and the accompanying paper.

C++ usage (embedding without Python)

lightgp is a dependency-free C++17 library — embed in iOS apps, robotics stacks, game engines.

#include "lightgp/inference/gp_exact.h"
#include "lightgp/kernels/composite_kernel.h"
#include "lightgp/kernels/rbf_kernel.h"
#include "lightgp/kernels/periodic_kernel.h"
#include "lightgp/core/mean.h"

using namespace lightgp;

auto kernel = scale(std::make_shared<RBFKernel>())
            + scale(std::make_shared<PeriodicKernel>(/*l=*/1.0f, /*period=*/1.0f));
auto mean   = std::make_shared<LinearMean>(/*input_dim=*/1);

GPExact gp(kernel, mean, /*noise_variance=*/0.01f, Backend::Auto);
gp.fit(X_train, y_train);              // X_train, y_train: row-major float32 Tensors
gp.optimize_hyperparameters(/*steps=*/200);

Tensor mean_out, var_out;
gp.predict(X_test, mean_out, var_out);

For very large N, switch to matrix-free CG (the N×N kernel is never materialized):

GPExact gp_cg(kernel, mean, 0.01f, Backend::Metal, Solver::CG);
gp_cg.fit(X_huge, y_huge);

For huge datasets, sparse VFE:

GPSparseHyperparams hp;
GPSparse gp_sp(hp);
gp_sp.fit(X_huge, y_huge, /*num_inducing=*/200);   // O(NM² + M³)

Building from source

macOS (M-series — Metal + Accelerate auto-detected)

./build.sh
./build/run_tests                       # 853 test cases across the C++ suite
./build/basic_regression
./build/mauna_loa                       # kernel composition demo
./build/bench_paper                     # full benchmark suite, JSON-per-line stdout

Linux (CPU + optional CUDA)

# CPU only (OpenBLAS / LAPACK auto-detected if installed)
./build.sh

# With CUDA (requires nvcc + CUDA Toolkit)
LIGHTGP_ENABLE_CUDA=1 ./build.sh

./build/run_tests

Install OpenBLAS / LAPACK first to get the fast CPU path:

sudo apt install libopenblas-dev liblapack-dev   # Debian / Ubuntu

The CUDA backend wires through Backend::CUDA and covers cuBLAS GEMM, cuSOLVER Cholesky, cuFFT (used by Solver::SKI), and custom CUDA kernels for the RBF / Matérn matrix construction and matrix-free :math:K\mathbf v matvec. Backend::Auto picks CUDA automatically when the build was configured with LIGHTGP_ENABLE_CUDA=1 and an NVIDIA device is present.

Opt-out flags

LIGHTGP_NO_METAL=1 ./build.sh             # disable Metal even on Darwin
LIGHTGP_NO_ACCELERATE=1 ./build.sh        # use reference C++ instead of Apple BLAS

Python bindings (development build, no CMake required)

python3 -m venv .venv && source .venv/bin/activate
pip install pybind11 numpy pytest
./python/build_python.sh                 # produces python/lightgp/_core.<ext>.so
PYTHONPATH=python pytest python/tests -v

Project layout

lightgp/
├── core/                Tensor, dispatch, backend / solver enums, Accelerate wrappers
├── kernels/             Kernel hierarchy (RBF, Matérn, Periodic, Linear, Sum/Product/Scale)
│   ├── cpu/             reference CPU + Accelerate paths
│   └── metal/           Metal Shading Language compute shaders
├── solvers/             Cholesky, conjugate gradients, Lanczos log-det
│   ├── cpu/
│   └── metal/
├── inference/           GPExact, GPSparse
├── data/                Bundled benchmark datasets (motorcycle, Mauna Loa, kin40k stand-ins)
├── tests/               853 C++ test cases
├── benchmarks/          10 standalone benches + Python GPyTorch comparison
├── examples/            basic_regression, mauna_loa (kernel composition)
└── python/              pybind11 bindings + pytest suite

Citation

If you use LightGP in your research, please cite:

@misc{fang2026lightgp,
  title         = {LightGP: Lightweight Gaussian Process Inference in C++ on Metal and CUDA},
  author        = {Yu-Hsueh Fang},
  year          = {2026},
  eprint        = {2605.17898},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  doi           = {10.48550/arXiv.2605.17898},
  url           = {https://arxiv.org/abs/2605.17898}
}

📄 Read the paper

License

MIT License. Copyright (c) 2026 Yu-Hsueh Fang. See LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lightgp-0.1.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lightgp-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

lightgp-0.1.1-cp313-cp313-macosx_14_0_arm64.whl (209.2 kB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

lightgp-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

lightgp-0.1.1-cp312-cp312-macosx_14_0_arm64.whl (209.2 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

lightgp-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

lightgp-0.1.1-cp311-cp311-macosx_14_0_arm64.whl (207.5 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

lightgp-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

lightgp-0.1.1-cp310-cp310-macosx_14_0_arm64.whl (206.2 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

lightgp-0.1.1-cp310-cp310-macosx_11_0_x86_64.whl (215.8 kB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

File details

Details for the file lightgp-0.1.1.tar.gz.

File metadata

  • Download URL: lightgp-0.1.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lightgp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 59360a6a204145eed13735e6a0f5049b9cb0133ca682f39efbec4971467286a8
MD5 c558261bda0f6d3be3aa72e3311170a1
BLAKE2b-256 73b1f830072f7d73b080259d32359600687c5f532e36be95dab95bc8cc033f85

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1.tar.gz:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 626c2f064db98cd43686db701d8969f6af5edc00079100e36b5c4546b2518410
MD5 8759521ebfe42b4be7617799866273fc
BLAKE2b-256 5b6bf14b598773a805e5e12ece1e1ee2c8a93b0a98bed2147c31666a6abb2882

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 8f182e7de484ff7ddb012e4060f4133972e4a815d648b46faafb4958af8c19b9
MD5 8d3381e5a770e39daa77c20e50fb2c4c
BLAKE2b-256 d63843475a4e7e0050b89a60f1c16a8ab9939117a8d576c776becc7042886243

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp313-cp313-macosx_14_0_arm64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 135d28f35b9290dcc300e7c85810e2e04c9c08e6490984d7bd5fa859ed05e3c4
MD5 721463118fcd41a961e23a823c766d12
BLAKE2b-256 5ed8c32cf2afd704cb370b91fce0545ab6353b06884a1b45b889cc103fdee5c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 beeabae424b3a3cf1e21b67bbbcedcea587d77e6c163636b6f10771b0074417b
MD5 ed1564246c08c5f41ca709f7348df52e
BLAKE2b-256 edd453188da8453a7d95b47d49878b1a9f75e243fcee45f4fe7afedfa2461be0

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7d184923b7d2e4b0228b1086480b288239c8eb06fc4f763798b604206e30e43e
MD5 6c4b3b7990f240e32b26d1f50b325ea7
BLAKE2b-256 a00c8a2d064babc0b5cd2d4bd53dbe927fb75f6a1b3c72c791f26a746b64dc5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 a93b4f172c3c9444eb44dae753eb5aa7fea844ec9775112b4a0beadffaea736d
MD5 855284be68ec83acc7b1ee4a665b3ee6
BLAKE2b-256 531c822cbada46bab48e48810803ab81cf307611e6bc93966093e6e3211bf3a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e197d0991976e1f74faaa370427956471140e391c1ab2d1bfaf0b27c65f0cc93
MD5 f66ab2c12db687ba7113ba8808e5fa56
BLAKE2b-256 5818015c790b8f4e3b70955ea601ccfb073c6d908b2216bcf02367d2c2aa3296

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 827e37cb1243ec76214967418e3ffb8951ada640fd9e28f2714ae3c87efd7106
MD5 abc3fa22866a54f841128a5ec2a87824
BLAKE2b-256 3a58f4e6d7fd2d4cd6eea7d2f3cb86a8ceccb9a7b5560c6d1c01551950b97446

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lightgp-0.1.1-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for lightgp-0.1.1-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 c3231028b9288ba6027de04b030ae5a0b328e8d9342efa5f4561f4ba234fcec1
MD5 2221876291950012bd1fb1732a6746d4
BLAKE2b-256 51a65bd067fc716f40679dd1b8e49d2d30de672690d27c98b9855b4ec089f5bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for lightgp-0.1.1-cp310-cp310-macosx_11_0_x86_64.whl:

Publisher: publish.yml on Fangop/lightgp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page