Skip to main content

package for specialized regression

Project description

License: MIT build workflow codecov

pyDataFitting

Linear and nonlinear fit functions that can be used e.g. for curve fitting. Is not meant to duplicate methods already implemented e.g. in NumPy or SciPy, but to provide additional, specialized regression methods, higher computation speed, or help with methods from well-known packages. You will need certain functions of my little_helpers repository and quite a few other, external packages like Numpy, Pandas, matplotlib, scikit-learn, statsmodels, Scipy.

Install with:

pip install pyDataFitting

or download repository and run the following from repository folder to get the latest version that might not have made it to PyPI, yet:

pip install -e.

Linear regression (in linear_regression.py)

  • dataset_regression: Does a classical linear least squares regression. Treats the input data as a linear combination of the different components from reference data. Can be used for example to fit spectra of mixtures with spectra of pure components. Produces the same result like, but much faster than using sklearn.linear_model.LinearRegression().fit(...).
  • lin_reg_all_sections: Does linear regressions on a dataset starting with the first two datapoints and expands the segment by one for each iteration. The regression metrics are useful to determine if a dataset behaves linearly at its beginning or not, and when a transition to nonlinear behavior occurs.

Polynomial regression (in polynomial_regression.py)

  • polynomial_fit: Allows to perform polynomial fits by minimizing the sum of the squared residuals while also taking equality constaints into account via Lagrange multiplicators. This can be used to force the regression function through certain points or to force it to have certain slopes at a given points. Also does unconstrained polynomial fits, but is slower than the corresponding Numpy functions.
  • piecewise_polynomial_fit: Allows to do a picewise polynomial fit on a dataset, i.e. the data is divided into segments that are then each fitted with an own polynomial function. The segments can be fitted with polynomials of different orders. It is possible to use equality constraints on the segment borders, so that the segments e.g. are forced to have the same y values at the borders or the same slopes.
  • segment_regression: Does a piecewise polynomial fit with the segment borders, y values at the segment borders, or the slopes at the segment borders as additional fit parameters. The additional fit parameters are estimated with an evolutionary fitting algorithm which calls picewise_polynomial_fit several times in each iteration, so the whole procedure is rather slow (albeit still very usable).

General nonlinear regression (in nonlinear_regression.py)

  • nonlinear_regression: Does nonlinear regressions by minimizing the sum of the squared residuals. Basically utilizes the minimize method from lmfit to estimatze the parameters of complex regression functions. The functions calculating the function values must be written externally, but this is pretty straight forward.

Principal component regression and partial least squares regression (in multivariate_regression.py)

  • principal_component_regression: A class for a principal component regression (PCR). Does a principal component analysis of the dataset and a multilinear regression on the resulting scores with one or several responses in order to generate a model to predict the responses from future data. The PCR parts work quite well, the methods included for generating various plots still need improving.
  • pls_regression: A class to help with using the partial least squares regression class from scikit-learn. It is usable, but could do with some redesigning.

Tools for supporting the use of ols from statsmodels.formula.api (in model_tools.py)

  • Provides simple methods to generate the model string for different simple models (linear, two-factor interaction, quadratic, etc.).
  • Provides a method to easily adapt the included parameters in the model string and a method to ensure model hierarchy.
  • Allows the calculation of model values if the model coefficients are provided.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydatafitting-0.0.6.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydatafitting-0.0.6-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file pydatafitting-0.0.6.tar.gz.

File metadata

  • Download URL: pydatafitting-0.0.6.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pydatafitting-0.0.6.tar.gz
Algorithm Hash digest
SHA256 bd4f18f7987f17bd2e3bde9d3b455177832cc443259731d5a18c785ccd0e68e6
MD5 c079c305a164460dea9ae060350cff85
BLAKE2b-256 16c603620c5b007a8e698dc339ee0915a01e2b936c9516c0a922561d49818218

See more details on using hashes here.

File details

Details for the file pydatafitting-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pydatafitting-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pydatafitting-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 cdd44691cbd8f8dc379e93ee8fd659c843a54fc16527eba00faeb631ff00704b
MD5 7aef182ff0c9e71b83ec021cbf079bc3
BLAKE2b-256 89f4bfd806a5054169ae639fa4571b84d40b8c8dfa43e260b35c53162ce92e0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page