Skip to main content

Fitting in python made easy

Project description

DOI

EZFIT A Dead simple interface for fitting in python

This package is built for use by people who are new not just to python but to coding, fitting, and programatically interacting with data. If you have experience with EXCEL, but need to fit data using least squares fitting, this is the tool for you.

Installation

There are four prerequisite functions for installing ezfit, pandas, numpy, matplotlib, and scipy. The package can be installed though the terminal with the following command.

pip install ezfit numpy pandas matplotlib

How to Use

Import the ezfit library. This will allow you to have a simple interface for fitting a pandas DataFrame to some model.

import ezfit

Loading Data

To start, load your data into a pandas DataFrame. Try to allways save your data as a .csv file with one line of headers. Read the documentation on this

Comma Deliminated

x, y, yerr
0, 1, .5
1, .5, .2
...

You can load this data easily with the following easy command

# start by importing the pandas module
import pandas as pd

# Everythin in python uses the dot notation to access attributes and functions
# we need the read_csv() function from pandas, so we will call
df = read_csv("path_to_file")    # note that you might need a full path

# lets check that the first 2 rows look correct by getting the `head` of the df
print(df.head(2))   # the print() statement is how you print something in python

The output should look something like this

   x    y  yerr
0  0  1.0   0.5
1  1  0.5   0.2

We can also plot the data quickly to make sure it looks right, and determine if there is any cleaning that needs to be done.

# Lets start by getting the standard python plotting library
import matplotlib.pyplot as plt

# the df.plot() function will plot the data in the dataframe in one easy go
df.plot(x = "x", y = "y", yerr="yerr")  # you can pass other parameters in too

# this will plot the collumn labeles "y" vs "x", with error bars of size "yerr"
# you can pretty this plot up if you like, but it is fine for just checking the data

plt.show()      # This will render the currently active plot

You might want to place this plot on a log scale, and this can be done in many ways. For a complete list of the parameters available to you, please read up on the pandas plot method

Tab Deliminated & Line Skips

Now if the dataset is not collumn seperated, as is the data from CXRO, you will need to tell pandas what seperates the collumns. Lets look at some CXRO index of refraction data

 Si3N4 Density=3.44
 Energy(eV), Delta, Beta
  30.  0.274695814  0.210541397
  31.7943592  0.252507478  0.17769818
  33.6960373  0.229885429  0.150933087

The first row is density informaton about the material, followed by the rows for Energy(eV), Delta, and Beta. So first we need to skip the first row of data points. Using the pd.read_csv() function, we can pass in the parameter skiprows = n where n is the number of rows we need skipped.

Now to get the data, we need to pass in a parameter telling pandas what to look for between collumns. Using the parameter sep = \s+ we can tell the function that there is an unknown number of space characters between collumns. Putting this together we have

df = pd.read_csv("path_to_file", sep=r"\s+", skiprows=1)

# printing the head gives us
print(df.head(2))
Energy(eV),    Delta,          Beta
0       30.000000  0.274696  2.105414e-01
1       31.794359  0.252507  1.776982e-01

Now there is one issue, the collumns have trailing commas. You can solve that easily in many ways.

df.columns = ["Energy(eV)","Delta","Beta"]
# or
df.columns = [col.replace(",", "") for col in df.columns]
# or
df.rename(columns={"Energy(eV),": "Energy(eV)", "Delta,": "Delta"})
# ... you get the idea

Using the same methods as above, you can plot the data, and do any cleaning to remove bad data points.

Defining a model

Now you will need to express your mathematical model as a python function. This is the hardest part of fitting. The syntax is rather simple, and you never need to use types because python is a neat language. Say we have a line $$ f(x) = mx+b, $$ this function maps $x\to f(x)$ using the parameters $m$ and $b$. The goal of fitting is to find these parameters that best describe our data. Because of this we need a python function where you can input not just the domain $x$ but also $m$ and $b$.

# Function in python are created by typeng 'def' before the name of the function

def f(x, m, b):     # For the code to work, x (or your domain) must be first
    """
    Tripple quotes can be used to create a `doc string` a fancy type of comment
    that gets attatched to the top of the function. It is allways a good idea
    to comment your functions to say what they do, why they do it, and how
    to use them. For example,

    x: Domain input

    m: slope

    b: y-intercept

    returns
    y = mx + b
    """
    y = m * x + b   # Use * for multiplication and ** for exonentiation
    return y        # return is the key word to say what the function returns

Fitting

Once you define a model, and load your dataset, you need to fit your data. This can be done very easily. So I will run you though the whole process

import pandas as pd
import matplotlib.pyplot as plt
import ezplot

# ══════════/ Load the Data/ ═════════════════
df = pd.read_csv("path_to_csv")

print(df.head(10))
df.plot(x = "x", y="y", yerr = "yerr")

# ══════════/ Clean the Data/ ═════════════════
# oh no the data x < 1 is bad
mask = (df["x"] > 1)
df = df[mask]

# ══════════/ Define a Model/ ═════════════════

def line(x, m, b):
    """Line function."""
    return m * x + b

# ══════════/ Fit the Data/ ═══════════════════

model, ax = df.fit(line, "x", "y", "y_err")
# this function will generate a quick plot of the fit results
plt.show()

# The model has parameters, errors, and goodness of fit
print(model)
line:
𝜒2: 88.71565403843992
reduced 𝜒2: 0.9052617759024482
m : (value=1.0858435676047251 ± 0.0497, bounds=(-inf, inf))
b : (value=-0.4650788531268627 ± 0.0903, bounds=(-inf, inf))

Now say you wanted to redo the fit but adding bounds and a starting value for the slobe of the line

model, ax = df.fit(line, "x", "y", "y_err", m={ "value" : 1, "min" : 0 })
# you can pass in a dictionary for each parameter in your model
print(model)

Now we get slightly different results

line:
𝜒2: 98.71565403843992
reduced 𝜒2: 1.0052617759024482
m : (value=1.158435676047251 ± 0.0497, bounds=(-inf, inf))
b : (value=-0.4650788531268627 ± 0.0903, bounds=(-inf, inf))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ezfit-0.3.0.tar.gz (84.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ezfit-0.3.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file ezfit-0.3.0.tar.gz.

File metadata

  • Download URL: ezfit-0.3.0.tar.gz
  • Upload date:
  • Size: 84.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.14

File hashes

Hashes for ezfit-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9f6abd570f1304533ee8df267d8617921b2163686d772f50590aa88a5eb86cce
MD5 6e5a40f7de5bc0de5bde99eb04df2ca9
BLAKE2b-256 9bfc6cf06ed7f99939b423f94a4e4c8cb46299057a93dd26d42c7425d0ffc26f

See more details on using hashes here.

File details

Details for the file ezfit-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: ezfit-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.14

File hashes

Hashes for ezfit-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0578f3668cfa70b36b8f25b9e61e0cf441f9ab2e4d0cb4c96cbbae1ba8026ec8
MD5 2695c25c1b90944113a292def0903fef
BLAKE2b-256 fbb328946ff8e0990d1c2d81e9f202841286bc73ddc3034f5de575d284fd34fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page