Skip to main content

Monte Carlo Uncertainty Propagation for regression with measurement errors

Project description

MCUP

MCUP (Monte Carlo Uncertainty Propagation) is a Python library for regression with measurement errors. It provides three sklearn-like estimators that correctly propagate x and y measurement uncertainties into parameter confidence intervals.

PyPI pyversions PyPI version CI codecov

Install

uv add mcup

Or with pip:

pip install mcup

Quick start

The core idea: you have data where measurement noise is not uniform, or x itself is measured. MCUP gives you honest parameter uncertainties in both cases.

Case 1 — only y has errors (heteroscedastic noise)

A photodetector where noise grows with signal: points at high intensity are less reliable. OLS doesn't know that and gives overconfident slope uncertainty. WeightedRegressor down-weights noisy points and produces calibrated intervals.

import numpy as np
from mcup import WeightedRegressor

rng = np.random.default_rng(42)
x = np.linspace(1, 10, 30)
y_err = 0.1 * x                  # noise grows with x
y = 2.0 * x + 1.0 + rng.normal(0, y_err)

def line(x, p):
    return p[0] + p[1] * x

# Uniform weights (wrong — ignores that high-x points are noisier)
ols = WeightedRegressor(line, method="analytical")
ols.fit(x, y, y_err=np.ones_like(x), p0=[0.0, 1.0])

# Correct weights from measurement errors
wls = WeightedRegressor(line, method="analytical")
wls.fit(x, y, y_err=y_err, p0=[0.0, 1.0])

print(f"OLS:      slope = {ols.params_[1]:.3f} ± {ols.params_std_[1]:.4f}  ← overconfident")
print(f"Weighted: slope = {wls.params_[1]:.3f} ± {wls.params_std_[1]:.4f}  ← calibrated")
# true slope = 2.0

Case 2 — both x and y have errors

A spring balance where both extension (x) and force (y) are measured with error. Ignoring x-errors causes attenuation bias (slope pulled toward zero) and intervals that are far too narrow. XYWeightedRegressor propagates both error sources.

from mcup import XYWeightedRegressor

rng = np.random.default_rng(0)
x_true = np.linspace(0.1, 2.0, 25)
x_err, y_err = 0.05 * np.ones(25), 0.15 * np.ones(25)
x_obs = x_true + rng.normal(0, x_err)
y = 8.0 * x_true + rng.normal(0, y_err)   # true spring constant k=8

# Ignoring x-errors (wrong)
bad = WeightedRegressor(line, method="analytical")
bad.fit(x_obs, y, y_err=y_err, p0=[0.0, 1.0])

# Propagating both errors (correct)
est = XYWeightedRegressor(line, method="analytical")
est.fit(x_obs, y, x_err=x_err, y_err=y_err, p0=[0.0, 1.0])

print(f"Ignoring x-err: k = {bad.params_[1]:.3f} ± {bad.params_std_[1]:.3f}  ← biased low, too narrow")
print(f"XYWeighted:     k = {est.params_[1]:.3f} ± {est.params_std_[1]:.3f}  ← unbiased, calibrated")
# true k = 8.0

Why MCUP

Standard least squares (OLS) assumes all observations are equally reliable. Real experiments break this in two common ways:

  • Heteroscedastic y-errors — measurement noise varies across the range. OLS overweights noisy points, biasing the fit and producing overconfident uncertainties.
  • Errors in x — when the independent variable is itself measured (time, concentration, displacement), ignoring those errors causes attenuation bias: slopes are pulled toward zero, and uncertainty intervals shrink below their true size.

Why not just use the covariance matrix from the optimizer?

When measurement errors are large, the standard approach of reading off sqrt(diag(cov)) from the fit residuals underestimates the true parameter uncertainty. The covariance matrix tells you how well the optimizer converged — it does not propagate the uncertainty that came in with your data. MCUP propagates measurement noise directly through the model so that params_std_ reflects both fit quality and input uncertainty. For a worked example comparing both approaches, see this Kaggle notebook on measurement error in regression.

MCUP fixes both problems. The figure below illustrates the effect for a linear calibration with heteroscedastic y-errors:

Comparison of OLS vs WeightedRegressor

Left: OLS (red) fits the same data differently from weighted regression (blue) because it treats all points equally regardless of σ_y. Right: over 500 simulated experiments, OLS coverage deviates from the nominal 68.3% — WeightedRegressor stays calibrated.

Estimators

Estimator Use when Error model
WeightedRegressor Only y has measurement errors Σ (y − f(x))² / σ_y²
XYWeightedRegressor Both x and y have errors (nonlinear) Combined variance via error propagation (IRLS)
DemingRegressor Both x and y have errors (linear only) Joint optimisation over parameters + latent true x

Each estimator supports:

  • method="analytical" — weighted LS + (J^T W J)^{-1} covariance (fast, exact for well-posed problems)
  • method="mc" — Monte Carlo with Welford covariance (robust cross-check for nonlinear models)

Benchmark summary

Validated across 13 physical scenarios (200 independent parameter configurations each). The analytical solver achieves well-calibrated 1σ uncertainty intervals on all scenarios:

Scenario Estimator Bias RMSE Coverage
Linear calibration (homo) WeightedRegressor +0.3% 12.8% ✓ 68%
Linear calibration (hetero) WeightedRegressor +0.5% 7.2% ✓ 71%
Radioactive decay WeightedRegressor −0.0% 2.6% ✓ 64%
Power law (diffusion) WeightedRegressor +0.0% 4.6% ✓ 68%
Gaussian spectral peak WeightedRegressor −0.1% 1.7% ✓ 66%
Damped oscillator WeightedRegressor −0.4% 7.2% ✓ 67%
Exp decay + timing errors XYWeightedRegressor −1.2% 5.0% ✓ 64%
Hooke's law (x+y errors) XYWeightedRegressor −1.0% 54% ✓ 75%
Beer-Lambert (x+y errors) XYWeightedRegressor +46% 220% ✓ 68%
Method comparison DemingRegressor +14% 111% ✓ 64%
Isotope ratio MS DemingRegressor +3.2% 420% ✓ 72%
Small sample (n=8) WeightedRegressor −2.7% 29% ✓ 69%
Low SNR WeightedRegressor −1.9% 136% ✓ 67%

Bias and RMSE are relative to the true parameter values. Large RMSE on near-zero intercepts (Beer-Lambert baseline, isotope intercept) reflects small absolute values — the coverage column is the reliable calibration metric.

OLS baseline: when ignoring measurement errors breaks uncertainty estimation

Plain OLS (no error weighting) estimates parameter uncertainty from fit residuals alone — σ² = SSR/(n−p). This works when noise is truly uniform. When noise varies across the range, OLS produces miscalibrated intervals even though the parameter estimates themselves may look reasonable.

Scenario OLS coverage WeightedRegressor coverage What goes wrong
S1 Linear (homo σ_y=0.5) ✓ 68%/70% ✓ 68%/70% — OLS works; noise is uniform
S2 Linear (hetero σ_y=0.1+0.1·x) ~ 86%/72% ✓ 71%/72% Intervals too wide; pooled σ² inflated by noisy high-x points
S3 Radioactive decay (Poisson √A) ✗ 32%/42% ✓ 64%/68% Badly overconfident; large early-time counts dominate residuals
S4 Power law (8% relative noise) ✓ 66%/66% ✓ 68%/69% — OLS approximately ok here
S5 Gaussian peak (Poisson counts) ✗ 39%/54% ✓ 66%/70% Overconfident; amplitude and center poorly constrained
S6 Damped oscillator (uniform σ_y) ✓ 64%/71% ✓ 67%/72% — OLS works; noise is uniform

The pattern: OLS coverage is correct only when σ_y is constant across the range (S1, S6). As soon as noise scales with signal — Poisson counting (S3, S5) or percentage-of-reading errors (S2, S4) — the pooled residual variance is a poor proxy for per-point noise, and uncertainty intervals become unreliable. The parameter estimates themselves are often similar; it is the uncertainty that OLS gets wrong.

Using the wrong estimator when x has errors

Scenario Wrong estimator Coverage Correct estimator Coverage
Exp decay + timing errors WeightedRegressor ✗ 30% XYWeightedRegressor ✓ 64%
Beer-Lambert WeightedRegressor ✗ 7% XYWeightedRegressor ✓ 68%
Method comparison WeightedRegressor (OLS) ✗ 32% DemingRegressor ✓ 66%

See DEVELOPING.md for contributing, running tests, and building docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcup-1.0.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcup-1.0.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file mcup-1.0.0.tar.gz.

File metadata

  • Download URL: mcup-1.0.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcup-1.0.0.tar.gz
Algorithm Hash digest
SHA256 de53e9afe4351db859b0b9419f999faeeeeb5e718c2f5d883d96c0d83a17c7ea
MD5 d77a174cfd663835730f96a0c9f1740d
BLAKE2b-256 abb882db0ff932ec2cf247dddf39087e1b417be0bfd6a659d1d7eeaee9181eaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcup-1.0.0.tar.gz:

Publisher: publish.yml on detrin/MCUP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mcup-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mcup-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcup-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1c43a94ab85bdb0c412ee96a388ed1dd7a6536da6b2303e5fe8557a6a6b69b3e
MD5 2e25f542e667dc06a157e9e8b04397d0
BLAKE2b-256 784eb8ff279e16f5ca6cc93957fe861e13c98c025b148d9cefdedd1813d8c666

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcup-1.0.0-py3-none-any.whl:

Publisher: publish.yml on detrin/MCUP

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page