A Python package for Gaussian Process Regression with hyperparameter optimization using Hyperopt and cross-validation, focusing on optimizing cross-validated loss.

These details have not been verified by PyPI

Project links

Project description

Bayesian GP CVLoss: Gaussian Process Regression with Cross-Validated Hyperparameter Optimization

bayesian_gp_cvloss is a Python package designed to simplify the process of training Gaussian Process (GP) models by finding optimal hyperparameters through Bayesian optimization (using Hyperopt) with k-fold cross-validation. The key feature of this package is its direct optimization of the cross-validated loss, aligning the hyperparameter tuning process closely with the model's predictive performance.

This package is particularly useful for researchers and practitioners who want to apply GP models without manually tuning hyperparameters or relying solely on maximizing marginal likelihood, offering a more direct approach to achieving good generalization on unseen data.

Core Idea

The traditional approach to training GP models often involves maximizing the log marginal likelihood of the model parameters. While effective, this doesn't always directly translate to the best predictive performance on unseen data, especially when the model assumptions are not perfectly met or when working with smaller datasets.

This library implements an alternative strategy:

Define a search space for the GP kernel parameters (e.g., length scales, kernel variance) and likelihood parameters (e.g., noise variance).
Use Bayesian optimization (Hyperopt) to intelligently search this space.
For each set of hyperparameters evaluated by Hyperopt, perform k-fold cross-validation on the training data.
The objective function is configurable: cross-validated RMSE, Negative Log Predictive Density (NLPD), or a weighted combination.
The set of hyperparameters yielding the minimum loss is selected as optimal.
A final GP model is then refitted on the entire training dataset using these best-found hyperparameters.

Features

Automated hyperparameter optimization for GP models using Hyperopt.
Cross-validation (k-fold) integrated into the optimization loop.
Three scoring objectives:
- "cv_rmse" — Minimise cross-validated RMSE (prediction accuracy).
- "nlpd" — Minimise Negative Log Predictive Density (prediction accuracy + uncertainty calibration).
- "combined" — Weighted combination of both, balancing accuracy and calibration.
Automatic Leave-One-Out (LOO): when the dataset is smaller than n_splits, the splitter falls back to LOO automatically.
Supports various GPflow kernels (RBF, Matern32, Matern52, RationalQuadratic by default).
Smart data-dependent defaults: search ranges are automatically computed from the training data.
Flexible overrides: fine-tune individual search ranges without building a full Hyperopt space.
Simple API: provide your preprocessed numerical X_train and y_train data.

Installation

pip install bayesian-gp-cvloss

Alternatively, install from source:

git clone https://github.com/Shifa-Zhong/bayesian-gp-cvloss.git
cd bayesian-gp-cvloss
pip install .

Dependencies

gpflow >= 2.0.0
hyperopt >= 0.2.0
scikit-learn >= 0.23.0
pandas >= 1.0.0
numpy >= 1.18.0

Quick Start

import numpy as np
from bayesian_gp_cvloss import GPCrossValidatedOptimizer

# Create synthetic data
np.random.seed(42)
X = np.random.rand(100, 3)
y = np.sin(X[:, 0] * 2 * np.pi) + X[:, 1]**2 + np.random.randn(100) * 0.1

# --- Option A: Classic RMSE objective (default, backward-compatible) ---
optimizer = GPCrossValidatedOptimizer(
    X_train=X, y_train=y,
    n_splits=5, random_state=42
)
best_params = optimizer.optimize(max_evals=50)

# --- Option B: NLPD objective (accuracy + uncertainty calibration) ---
optimizer = GPCrossValidatedOptimizer(
    X_train=X, y_train=y,
    scoring="nlpd",           # <-- NEW
    n_splits=5, random_state=42
)
best_params = optimizer.optimize(max_evals=50)

# --- Option C: Combined objective ---
optimizer = GPCrossValidatedOptimizer(
    X_train=X, y_train=y,
    scoring="combined",       # <-- NEW
    nlpd_weight=0.5,          # <-- NEW: weight for NLPD term
    n_splits=5, random_state=42
)
best_params = optimizer.optimize(max_evals=50)

# Access results — both RMSE and NLPD are always recorded
trials = optimizer.get_optimization_results()
if trials.best_trial:
    result = trials.best_trial['result']
    print(f"Best CV RMSE: {result['cv_rmse']:.4f}")
    print(f"Best CV NLPD: {result['cv_nlpd']:.4f}")
    print(f"Best Train RMSE: {result['train_loss']:.4f}")

# Predict
y_pred, y_var = optimizer.predict(X_test)

Scoring Objectives Explained

`"cv_rmse"` (default)

Minimises the mean cross-validated Root Mean Squared Error. This directly targets prediction accuracy and is equivalent to the behaviour of v0.1.x.

`"nlpd"` — Negative Log Predictive Density

Treats the GP prediction as a Gaussian distribution N(mu, sigma^2) and evaluates how likely the true observation is under that distribution:

NLPD = 0.5 * log(2*pi) + 0.5 * log(sigma^2) + 0.5 * (y - mu)^2 / sigma^2

This simultaneously penalises:

Inaccurate means: large (y - mu)^2
Overconfident predictions: small sigma^2 when the prediction is wrong
Underconfident predictions: large sigma^2 when the prediction is right

This is particularly important for Bayesian optimisation, where acquisition functions (EI, UCB, etc.) depend on both the predicted mean and variance.

`"combined"`

A weighted sum of normalised RMSE and NLPD:

loss = (1 - nlpd_weight) * norm_RMSE + nlpd_weight * norm_NLPD

Both metrics are min-max normalised using the optimisation history so that the weight is meaningful regardless of scale. The default nlpd_weight=0.5 gives equal importance to accuracy and calibration.

Automatic Leave-One-Out (LOO)

When the training set has fewer samples than n_splits, the optimizer automatically switches to Leave-One-Out cross-validation. This avoids empty validation folds and provides the most data-efficient evaluation for very small datasets (common in materials optimisation with expensive experiments).

# With only 8 samples and n_splits=10, LOO is used automatically
optimizer = GPCrossValidatedOptimizer(
    X_train=X_small,  # shape (8, 3)
    y_train=y_small,
    n_splits=10,      # Auto-switches to LOO (8 folds)
    random_state=42
)

Customization

Scoring: scoring="cv_rmse", "nlpd", or "combined".
NLPD weight: nlpd_weight=0.5 (only for "combined" mode).
Kernels: kernels=["RBF", "Matern52"] to search only specific kernels.
Lengthscale range: lengthscale_bounds=(0.05, 50.0).
Kernel variance range: kernel_variance_bounds=(1e-4, 10.0).
Noise variance range: noise_variance_bounds=(1e-6, 1.0).
Full custom space: hyperopt_space={...} for complete control.
Cross-Validation: n_splits and random_state.
Hyperopt: max_evals and rstate_seed in optimize().

Contributing

Contributions are welcome! If you have suggestions for improvements or find any issues, please open an issue or submit a pull request to the GitHub repository: https://github.com/Shifa-Zhong/bayesian-gp-cvloss

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Shifa Zhong (sfzhong@tongji.edu.cn) GitHub: Shifa-Zhong

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 7, 2026

0.1.7

Mar 16, 2026

0.1.6

Mar 16, 2026

0.1.5

May 27, 2025

0.1.4

May 27, 2025

0.1.3

May 27, 2025

0.1.2

May 26, 2025

0.1.1

May 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bayesian_gp_cvloss-0.2.0.tar.gz (17.5 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bayesian_gp_cvloss-0.2.0-py3-none-any.whl (15.1 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file bayesian_gp_cvloss-0.2.0.tar.gz.

File metadata

Download URL: bayesian_gp_cvloss-0.2.0.tar.gz
Upload date: Apr 7, 2026
Size: 17.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for bayesian_gp_cvloss-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`23777e37fa1fccd7adfe2bafa55e45a5359d551fb1524758367143f476032789`
MD5	`cb84611f18c33ac23408bd3379a8044e`
BLAKE2b-256	`d5891a2ed2fa9c9c4dca94ad774dea1366b12f7641d598ea1bb7c6e3a238acc6`

See more details on using hashes here.

File details

Details for the file bayesian_gp_cvloss-0.2.0-py3-none-any.whl.

File metadata

Download URL: bayesian_gp_cvloss-0.2.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 15.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for bayesian_gp_cvloss-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f83552d0097a3d3bbd8d2648d890137b68d87ff633eac0d8ce0a87748b16d8b`
MD5	`0cde9c61155e7513f1abb63153f4ee1a`
BLAKE2b-256	`a08fb2ab2a0d8ea4d542ccf4eab98e318ae7717c5433fc56621d6048e4233d72`

See more details on using hashes here.

bayesian-gp-cvloss 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Bayesian GP CVLoss: Gaussian Process Regression with Cross-Validated Hyperparameter Optimization

Core Idea

Features

Installation

Dependencies

Quick Start

Scoring Objectives Explained

"cv_rmse" (default)

"nlpd" — Negative Log Predictive Density

"combined"

Automatic Leave-One-Out (LOO)

Customization

Contributing

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`"cv_rmse"` (default)

`"nlpd"` — Negative Log Predictive Density

`"combined"`