Skip to main content

Function fitting with errors on x and y

Project description

physfit — Function fitting with errors on both x and y

Introduction

This package provides a simple framework for fitting functions to two-dimensional data. It allows you to specify uncertainties not only on y-coordinates, but also on x-coordinates.

Example of use

Usage can be as simple as

import physfit
physfit.fit("linear", xx, yy)

A range of standard function forms is provided (the most basic being “linear”, i.e., y = A x + B).

Use with arbitrary functional forms

You can also specify your own arbitrary function form along with an initial estimate of parameter values, as in:

physfit.fit(lambda x, A, B: A*x + B, xx, yy, p0=[1, 0])

(although in this example, you could have just used “linear”).

Specifying uncertainties

With physfit, it is particularly easy to specify measurement uncertainties on each of your data points:

physfit.fit("exp", xx, yy, sy=sy)

If sy is a vector of uncertainties on y, this causes each point to be individually weighted accordingly. Uncertainties on x can likewise be specified as sx. It is even possible to specify the covariance between x and y errors using sxy.

Retrieving fit results

Each of the above forms directly prints the fit results to your terminal for quick exploration. Naturally, you can also store the results in a variable:

f = physfit.fit("linear", xx, yy)

In that form, f becomes a class instance with attributes to retrieve fit parameters and uncertainties. You can then use this instance to plot the fitted curve:

plt.plot(xxx, f(xxx))

This works because the return value from physfit.fit not only encapsulates the fitted parameters, but is also a callable representation of the fitted function.

Plotting the ±1σ confidence interval around the fitted curve is almost as easy:

plt.fill_between(xxx, f(xxx) - f.df(xxx), f(xxx) + f.df(xxx))

The object f also contains the R² value for the fit, the χ² value (if you specified uncertainties at least on y), and the covariance matrix between the parameters.

Installation

As easy as

pip install physfit

Fully worked examples

Each of the following assume that you start with:

import numpy as np
np.set_printoptions(precision=4) 
import physfit
rng = np.random.default_rng(12345)
import matplotlib.pyplot as plt
plt.ion()

def plotwitherrors(xx, yy, sx, sy):
    for x, y, dx, dy in zip(xx, yy, sx, sy):
        plt.plot([x, x], [y-dy, y+dy], '-', color=(.7, .7, .7))
        plt.plot([x-dx, x+dx], [y, y], '-', color=(.7, .7, .7))
    plt.plot(xx, yy, 'ko')

Perilous outliers

When fitting without specified uncertainties, one poorly measured data point can have a large impact on fit results. When you can tell the curve fitter that the point has larger uncertainties associated with it than the other points, that impact is mitigated:

xx = [0.1, 1.05, 1.95, 3.1, 5.3]
yy = [5.1, 5.9, 7.0, 7.9, 9.1]
sy = [0.1, 0.1, 0.1, 0.1, 0.3]
sx = [0.1, 0.1, 0.1, 0.1, .8]

f1 = physfit.fit('linear', xx, yy)
f2 = physfit.fit('linear', xx, yy, sy=sy)
f3 = physfit.fit('linear', xx, yy, sx=sx, sy=sy)

plt.figure(1)
plt.clf()
plotwitherrors(xx, yy, sx, sy)
x0a = np.arange(-0.5, 5.6, .1)
for f in [f1, f2, f3]:
    plt.plot(x0a, f(x0a), '-')

Figure 1

After that,

print(f3)

results in:

Fit of 5 points with specified errors on X and Y to:

  y = A*x + B

  A = 0.94 ± 0.06
  B = 5.02 ± 0.12

cov = [ 0.0036 -0.0057]
      [-0.0057  0.0137]

R² = 0.990
χ² = 1.02095

If you need more decimals, you can of course extract the parameters:

slope = f3.A
intercept = f3.B
print(slope, intercept)

which yields:

0.9368703836090297 5.016498767549166

The standard report gives a reasonable number of decimals based on the parameter uncertainties.

Fitting a sinusoidal curve

In this example, physfit determines both the amplitude, the frequency, and the phase of a sinusoidal signal:

xx = np.arange(0, 8*np.pi, np.pi/30)
yy0 = np.sin(xx)
N = len(xx)
sy = 0.5
yy = yy0 + sy * rng.normal(size=N)

f1 = physfit.fit('cos', xx, yy)

plt.figure(2)
plt.clf()
plotwitherrors(xx, yy, 0*xx, sy+0*yy)
plt.plot(xx, f1(xx))

Figure 2

Following that,

print(f1)

results in:

Fit of 240 points without specified errors to:

  y = A*cos(B*x + C)

  A = 0.98 ± 0.04
  B = 1.012 ± 0.006
  C = -1.68 ± 0.09

      [ 0.001953  0.000017 -0.000212]
cov = [ 0.000017  0.000037 -0.000458]
      [-0.000212 -0.000458  0.007703]

R² = 0.676

Note that the ground truth is A = 1, B = 1, C = ±π.

Fitting a bell curve

In this example, physfit determines some of the shape parameters of a bell curve. In this example, uncertainties on x and y are different depending on the x and y values. As a consequence, fitting without specified uncertainties works poorly. In fact, in this (admittedly somewhat contrived) example, fitting with only y uncertainties is even worse. Once you fit with uncertainties on both x and y, the results match the ground truth the closest.

xx0 = np.arange(-3, 3, 0.2)
N = len(xx0)
yy0 = np.exp(-0.5 * xx0**2)
sy0 = 0.1
yy = yy0 + sy0 * rng.normal(size=N) * np.sqrt(yy0)
sy = sy0 * np.sqrt(yy0)
sx = 0.1 * (1 + xx0**2)
xx = xx0 + sx * rng.normal(size=N)

form = lambda xx, a, b: a * np.exp(-b*xx**2)

f1 = physfit.fit(form, xx, yy, p0=[1, 1])
f2 = physfit.fit(form, xx, yy, sy=sy, p0=[1, 1])
f3 = physfit.fit(form, xx, yy, sy=sy, sx=sx, p0=[1, 1])

plt.figure(3)
plt.clf()
plotwitherrors(xx, yy, sx, sy)
x0a = np.arange(-4, 4, 0.1)
for f in [f1, f2, f3]:
    plt.plot(x0a, f(x0a))

Figure 3

The eccentric quadratic

As an example of the effect of correlations between fitted parameters, consider this dataset with a quadratic relationship between x and y. If we fit y = A x² + B x + C in this case, the covariance btween B and C is particularly large. As a consequence, the fit predictions become extremely uncertain if we ask the fitter to extrapolate even slightly.

If we have a priori knowledge that B = 0, we can do a fit without that parameter and get much tighter results:

xx = np.arange(3, 8, .5)
N = len(xx)
yy0 = 2 * xx**2 - 3
sy = 10
yy = yy0 + sy * rng.normal(size=N)

f1 = physfit.fit("quadratic", xx, yy)
f2 = physfit.fit(lambda x, A, B: A*x**2 + B, xx, yy, p0=[1, 0])

fig, ax = plt.subplots(1, 2)
for a, f in zip(ax, [f1, f2]):
    plt.axes(a)
    x0a = np.arange(0., 8.5, 0.1)
    plt.fill_between(x0a, f(x0a) - f.df(x0a), 
                     f(x0a) + f.df(x0a),  alpha=0.4)
    plt.plot(x0a, f(x0a))
    plotwitherrors(xx, yy, 0*xx, sy+0*yy)

Figure 4

(You don’t have to name the parameters to your lambda A, B, etc., but the attributes in the fit result will be called A, B, etc., regardless, so sticking with the standard reduces confusion.)

Development

Development of physfit is on github.

Acknowledgments

The inspiration for physfit and an earlier (Matlab) version of the software, came from a program that was in use in the “Ph 3” physics undergraduate lab at Caltech in the early 2000s. Much to my chagrin, I no longer have the notes that contained the name of its author. If you are that author or know their name, I would love to hear from you.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

physfit-0.9.0-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file physfit-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: physfit-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for physfit-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62034700669560a44a6b27bce2649692c9dd987aaff1db2aac45d75d62175126
MD5 ceac4295db5ef3dfd6a6113abd552d40
BLAKE2b-256 cba1d81e0b8bf83822742fa07d2358789ea1f7b68f7978a0f7ad553d8c852f29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page