Skip to main content

fit piece-wise linear function to data

Project description

About

A library for fitting a continuous piecewise linear function f(x) to data. Just specify the number of line segments you desire and your data set.

Check out the examples!

Read the blog post.

Example of a continuous piecewise linear fit to a data set.

Example of a continuous piecewise linear fit to a data set.

Example of a continuous piecewise linear fit to a sin wave

Example of a continuous piecewise linear fit to a sin wave

Features

For a specified number of line segments, you can determine (and predict from) the optimal continuous piecewise linear function f(x). See this example.

You can fit and predict a continuous piecewise linear function f(x) if you know the specific x locations where the line segments terminate. See this example.

If you want to pass different keywords for the SciPy differential evolution algorithm see this example.

You can use a different optimization algorithm to find the optimal location for line segments by using the objective function that minimizes the sum of square of residuals. See this example.

Instead of using differential evolution, you can now use a multi-start gradient optimization with fitfast() function. You can specify the number of starting points to use. The default is 2. This means that a latin hyper cube sampling (space filling DOE) of 2 is used to run 2 L-BFGS-B optimizations. See this example which runs fit() function, then runs the fitfast() to compare the runtime differences!

Installation

You can now install with pip.

sudo pip install pwlf

Or clone the repo

git clone https://github.com/cjekel/piecewise_linear_fit_py.git

then install with pip

sudo pip install piecewise_linear_fit_py/

or easy_install

sudo easy_install piecewise_linear_fit_py/

or using setup.py

cd piecewise_linear_fit_py/
sudo python setup.py install

How it works

This is based on a formulation of a piecewise linear least squares fit, where the user must specify the location of break points. See this post which goes through the derivation of a least squares regression problem if the break point locations are known. Alternatively check out Golovchenko (2004).

Global optimization is used to find the best location for the user defined number of line segments. I specifically use the differential evolution algorithm in SciPy. I default the differential evolution algorithm to be aggressive, and it is probably overkill for your problem. So feel free to pass your own differential evolution keywords to the library. See this example.

Why

All other methods require the user to specify the specific location of break points, but in most cases the best location for these break points is unknown. It makes more sense to rather have the user specify the desired number of line segments, and then to quantitatively choose the best location for the ends of these line segments.

Changelog

  • 2018/12/04 Version 0.2.10: Only docstring changes, fix spelling mistakes, add Notes about **kwargs in scipy optimization functions. Version 0.2.11: fix readme.rst for pypi.org…

  • 2018/10/03 Add example of bare minimum model persistance to predict for new data (see examples/model_persistence_prediction.py). Bug fix in predict function for custom parameters. Add new test function to check that predict works with custom parameters.

  • 2018/08/11 New function which calculates the predication variance for given array of x locations. The predication variance is the squared version of the standard error (not to be confused with the standard errrors of the previous change). New example prediction_variance.py shows how to use the new funciton.

  • 2018/06/16 New function which calculates the standard error for each of the model parameters (Remember model parameters are stored as my_pwlf.beta). Standard errors are calculated by calling se = my_pwlf.standard_errors() after you have performed a fit. For more information about standard errors see this. Fix docstrings for all functions.

  • 2018/05/11 New sorted_data key which can be used to avoided sorting already ordered data. If your data is already ordered as x[0] < x[1] < … < x[n-1], you may consider using sorted_data=True for a slight performance increase. Additionally the predict function can take the sorted_data key if the data you want to predict at is already sorted. Thanks to V-Kh for the idea and PR.

  • 2018/04/15 Now you can find piecewise linear fits that go through specified data points! Read this post for the details.

  • 2018/04/09 Intelligently converts your x, y, or breaks to be numpy array.

  • 2018/04/06 Speed! pwlf just got better and faster! A vast majority of this library has been entirely rewritten! New naming convention. The class piecewise_lin_fit() is being depreciated, now use the class PiecewiseLinFit(). See this post for details on the new formulation. New test function that tests predict().

  • 2018/03/25 Default now hides optimization results. Use disp_res=True when initializing piecewise_lin_fit to change. The multi-start fitfast() function now defaults to the minimum population of 2.

  • 2018/03/11 Added try/except behavior for fitWithBreaks function such that the function could be used in an optimization routine. In general when you have a singular matrix, the function will now return np.inf.

  • 2018/02/16 Added new fitfast() function which uses multi-start gradient optimization instead of Differential Evolution. It may be substantially faster for your application. Also it would be a good candidate if you don’t need the best solution, but just a reasonable fit. Fixed bug in tests function where assert was checking bound, not SSr. New requirement, pyDOE library. New 0.1.0 Version.

  • 2017/11/03 add setup.py, new tests folder and test scripts, new version tracking, initialize break0 breakN in the beginning

  • 2017/10/31 bug fix related to the case where break points exactly equal to x data points ( as per issue https://github.com/cjekel/piecewise_linear_fit_py/issues/1 ) and added attributes .sep_data_x, .sep_data_y, .sep_predict_data_x for troubleshooting issues related to the separation of data points to their respective regions

  • 2017/10/20 remove determinant calculation and use try-except instead, this will offer a larger performance boost for big problems. Change library name to something more Pythonic. Add version attribute.

  • 2017/08/03 gradients (slopes of the line segments) now stored as piecewise_lin_fit.slopes (or myPWLF.slopes) after they have been calculated by performing a fit or predicting

  • 2017/04/01 initial release

Requirements

Python 2.7+ (Python 2.7 and Python 3.4 have been tested)

NumPy (Tested on version >= 1.14.0)

SciPy (Tested on version >= 0.19.0)

pyDOE (Tested on version >= 0.3.8)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pwlf-0.2.11.tar.gz (13.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page