Skip to main content

Python module for performing linear regression for data with measurement errors and intrinsic scatter

Project description

BCES and WLS: Linear regression for data with measurement errors and intrinsic scatter

Python module for performing robust linear regression on (X,Y) data points with measurement errors.

The BCES fitting method is the bivariate correlated errors and intrinsic scatter (BCES) and follows the description given in Akritas & Bershady. 1996, ApJ. Some of the advantages of BCES regression compared to ordinary least squares (OLS) fitting:

  • it allows for measurement errors on both variables
  • it permits the measurement errors for the two variables to be dependent
  • it permits the magnitudes of the measurement errors to depend on the measurements
  • other "symmetric" lines such as the bisector and the orthogonal regression can be constructed.

The WLS (weighted least squares) method handles the case where only Y has measurement errors and X is treated as error-free. It accounts for intrinsic scatter in the data and follows Akritas & Bershady 1996, §2.3.

Installation

pip install bces

Alternatively, if you plan to modify the source then install the package with a symlink, so that changes to the source files will be immediately available:

pip install -e .

Usage

BCES

import bces.bces as BCES
a,b,aerr,berr,covab=BCES.bcesp(x,xerr,y,yerr,cov)

Arguments:

  • x,y : 1D data arrays
  • xerr,yerr: measurement errors affecting x and y, 1D arrays
  • cov : covariance between the measurement errors, 1D array

If you have no reason to believe that your measurement errors are correlated (which is usually the case), you can provide an array of zeroes as input for cov:

cov = numpy.zeros_like(x)

Output:

  • a,b : best-fit parameters a,b of the linear regression such that y = Ax + B.
  • aerr,berr : the standard deviations in a,b
  • covab : the covariance between a and b (e.g. for plotting confidence bands)

Each element of the arrays a, b, aerr, berr and covab correspond to the result of one of the different BCES lines: $y|x$, $x|y$, bissector and orthogonal, as detailed in the table below. Please read the original BCES paper to understand what these different lines mean.

Element Method Description
0 y|x Assumes x as the independent variable
1 x|y Assumes y as the independent variable
2 bissector Line that bisects the y|x and x|y. This approach is self-inconsistent, do not use this method.
3 orthogonal Orthogonal least squares: line that minimizes orthogonal distances. Should be used when it is not clear which variable should be treated as the independent one

By default, bcesp runs the bootstrapping in parallel.

WLS

import bces.bces as BCES
a,b,aerr,berr,covab=BCES.wls(x,y,yerr)

Arguments:

  • x,y: 1D data arrays
  • yerr: measurement errors affecting y, 1D array

Output:

  • a,b: best-fit slope and intercept of the linear regression such that y = Ax + B (scalars)
  • aerr,berr: the standard deviations in a,b
  • covab: the covariance between a and b

Note that unlike BCES, WLS returns scalar values (a single regression line) rather than 4-element arrays.

The wlsp method performs bootstrapping in parallel, if you need that.

When to use BCES or WLS?

Both methods return unbiased estimates of the slope and intercept, but they suit different statistical situations:

  • Use BCES when both X and Y have measurement errors, or when measurement errors on X and Y may be correlated.
  • Use WLS when only Y has measurement errors (X is error-free or its errors are negligible).

Both methods account for intrinsic scatter.

Why choose WLS over OLS? When only Y has measurement errors, prefer WLS over OLS. OLS assigns equal weight to every data point regardless of measurement uncertainty, while WLS weights each point by the inverse of its error variance so more precisely measured points have greater influence on the fit. This produces more accurate and statistically efficient estimates when data points have heteroscedastic (unequal) errors.

Examples

bces-examples.ipynb is a jupyter notebook including a practical, step-by-step example of how to use BCES to perform regression on data with uncertainties on x and y. It also illustrates how to plot the confidence band for a fit.

wls.ipynb is a jupyter notebook with examples of WLS regression, including fits with intrinsic scatter.

Running Tests

pytest -v -s

Citation

If you end up using this code in your paper, you are morally obliged to cite the following works

I spent considerable time writing this code, making sure it is correct and user-friendly, so I would appreciate your citation of the first paper in the above list as a token of gratitude.

If you are really happy with the code, you can buy me a coffee. Buy Me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bces-2.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bces-2.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file bces-2.0.tar.gz.

File metadata

  • Download URL: bces-2.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for bces-2.0.tar.gz
Algorithm Hash digest
SHA256 9a5cacc40549159a9561dc07de1209da2cf56a1b4a50b93e61eb15449cb466fe
MD5 71f15407eedf2bcf0de93b478f8c60c3
BLAKE2b-256 d3ad1461ead5e7c18a2560b7d0cc789fddd8d2e6dbffd5a2aa60386a368dab4d

See more details on using hashes here.

File details

Details for the file bces-2.0-py3-none-any.whl.

File metadata

  • Download URL: bces-2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for bces-2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bcfe31fbae8c7f8e9d17a4d371e186e539791e59617fac046e635ba3d847129
MD5 8b51453972890add6519b81197eb61d6
BLAKE2b-256 a2be19f82aab84605390af2c614d4c3bb676d6010be814b961482fcf620dcdde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page