Skip to main content

Bayesian regression for low-noise data using POPS algorithm

Project description

POPSRegression

Regression scheme from the paper

Parameter uncertainties for imperfect surrogate models in the low-noise regime

TD Swinburne and D Perez, arXiv 2024

@misc{swinburne2024,
      title={Parameter uncertainties for imperfect surrogate models in the low-noise regime}, 
      author={Thomas D Swinburne and Danny Perez},
      year={2024},
      eprint={2402.01810},
      archivePrefix={arXiv},
      primaryClass={stat.ML},
      url={https://arxiv.org/abs/2402.01810v3}, 
}

Bayesian regression for low-noise data (vanishing aleatoric uncertainty).

Fits the weights of a regression model using BayesianRidge, then estimates weight uncertainties (sigma_ in BayesianRidge) accounting for model misspecification using the POPS (Pointwise Optimal Parameter Sets) algorithm [1]. The alpha_ attribute which estimates aleatoric uncertainty is not used for predictions as correctly it should be assumed negligable.

Bayesian regression is often used in computational science to fit the weights of a surrogate model which approximates some complex calcualtion. In many important cases the target calcualtion is near-deterministic, or low-noise, meaning the true data has vanishing aleatoric uncertainty. However, there can be large misspecification uncertainty, i.e. the model weights are instrinsically uncertain as the model is unable to exactly match training data.

Existing Bayesian regression schemes based on loss minimization can only estimate epistemic and aleatoric uncertainties. In the low-noise limit, weight uncertainties (sigma_ in BayesianRidge) are significantly underestimated as they only account for epistemic uncertainties which decay with increasing data. Predictions then assume any additional error is due to an aleatoric uncertainty (alpha_ in BayesianRidge), which is erroneous in a low-noise setting. This has significant implications on how uncertainty is propagated using weight uncertainties.

Example usage

Here, usage follows sklearn.linear_model, inheriting BayesianRidge

After running BayesianRidge.fit(..), the alpha_ attribute is fixed to np.inf as aleatoric uncertainty is assumed negligable.

The sigma_ matrix still contains epistemic weight uncertainties, whilst misspecification_sigma_ contains the POPS uncertainties.

from POPSRegression import POPSRegression

X_train,X_test,y_train,y_test = ...

# Sobol resampling of hypercube with 1.0 samples / training point
model = POPSRegression(resampling_method='sobol',resample_density=1.)

# fit the model, sample POPS hypercube
model.fit(X_train,y_train)

# Return hypercube std, max/min and epistemic uncertaint from inference
y_pred, y_std, y_max, y_min, y_std_epistmic = \
    model.predict(X_test,return_bounds=True,resample=True,return_epistemic_std=True)

# can also return max/min 
y_pred, y_std, y_max, y_min = model.predict(X_test,return_bounds=True)


# returns std by default
y_pred, y_std = model.predict(X_test)

# can also return max/min 
y_pred, y_std, y_max, y_min = model.predict(X_test,return_bounds=True)

# can also resample the hypercube vectors
y_pred, y_std, y_max, y_min = model.predict(X_test,return_bounds=True,resample=True)

# can also return the epistemic uncertainty (descreases as 1/sqrt(n_samples))
y_pred, y_std, y_max, y_min, y_std_epistmic = model.predict(X_test,return_bounds=True,resample=True,return_epistemic_std=True)

As can be seen, the final error bars give very good coverage of the test output

Extreme low-dimensional case, fitting N data points to a quartic polynomial (P=5 parameters) to some complex oscillatory function

Green: two sigma of sigma_ weight uncertainty from Bayesian Regression (i.e. without alpha_ term for aleatoric error)

Orange: two sigma of sigma_ and misspecification_sigma_ posterior from POPS Regression

Example POPS regression

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

popsregression-0.2.0.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

POPSRegression-0.2.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file popsregression-0.2.0.tar.gz.

File metadata

  • Download URL: popsregression-0.2.0.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for popsregression-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3897b10b1414e4297a5e6fa8f4a6223d0b049af77f6719821051eb900579fc8
MD5 3e35bbbb35358a77e9d5fedad8bd83e4
BLAKE2b-256 4f02631e99a4e296d81f4bf0b831b0700c9222c0b0c795eddc41f045628d8449

See more details on using hashes here.

File details

Details for the file POPSRegression-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: POPSRegression-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for POPSRegression-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 acb573d0401f13cd2c80c7be2ef0fc14abcfd8d5d0a015e2143f8e363dc8627f
MD5 32e735297577ec4edc304188ef214a20
BLAKE2b-256 6e12fd3e7a2b6ca208fcc2d08d202099470b0d314d8896865fb2cc31523b2517

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page