LtsFit: Least Trimmed Squares Fitting
Project description
The LtsFit Package
Robust Least Squares Regression with Uncertainties and Scatter in Any Dimension
LtsFit is a Python package for very robust hyperplane fitting in N dimensions, with uncertainties in all coordinates and intrinsic scatter. It implements the method described in Section 3.2 of Cappellari et al. (2013a) and uses the Least Trimmed Squares (LTS) technique to iteratively clip outliers (Rousseeuw & van Driessen 2006).
Attribution
Please also cite Cappellari et al. (2013a) if you use this software for your research. This is the paper where the implementation was described. The BibTeX entry for the paper is:
@ARTICLE{Cappellari2013a, author = {{Cappellari}, M. and {Scott}, N. and {Alatalo}, K. and {Blitz}, L. and {Bois}, M. and {Bournaud}, F. and {Bureau}, M. and {Crocker}, A.~F. and {Davies}, R.~L. and {Davis}, T.~A. and {de Zeeuw}, P.~T. and {Duc}, P.-A. and {Emsellem}, E. and {Khochfar}, S. and {Krajnovi{\'c}}, D. and {Kuntschner}, H. and {McDermid}, R.~M. and {Morganti}, R. and {Naab}, T. and {Oosterloo}, T. and {Sarzi}, M. and {Serra}, P. and {Weijmans}, A.-M. and {Young}, L.~M.}, title = "{The ATLAS$^{3D}$ project - XV. Benchmark for early-type galaxies scaling relations from 260 dynamical models: mass-to-light ratio, dark matter, Fundamental Plane and Mass Plane}", journal = {MNRAS}, eprint = {1208.3522}, year = 2013, volume = 432, pages = {1709-1741}, doi = {10.1093/mnras/stt562} }
Installation
install with:
pip install ltsfit
Without writing access to the global site-packages directory, use:
pip install --user ltsfit
To upgrade the package to the latest version use:
pip install --upgrade ltsfit
Documentation
See ltsfit/examples and the files docstrings. They are copied by pip within the global folder site-packages.
ltsfit
Purpose
Fit a linear function of the form:
y = a + b1*x1 + b2*x2 +...+ bm*xm,
to data with errors in all coordinates and intrinsic scatter, using a robust method that clips outliers. The function can handle lines in 2-dim, planes in 3-dim, or hyperplanes in N-dim, where x1, x2,..., xm are the independent variables and y is the dependent variable. The method was introduced in Sec. 3.2 of Cappellari et al. (2013a) and the treatment of outliers is is based on the FAST-LTS technique by Rousseeuw & van Driessen (2006). See also Rousseeuw (1987).
Calling Sequence
from ltsfit.ltsfit import ltsfit
p = ltsfit(x, y, sigx, sigy, clip=2.6, corr=True, epsy=True,
frac=None, label='Fitted', label_clip='Clipped',
legend=True, pivot=None, plot=True, text=True)
print(f"Best fitting parameters: {p.coeff}")
The output values are stored as attributes of the p object.
Input Parameters
- x: array_like with shape (n, m)
Array of n independent variables for m dimensions.
EXAMPLE: To fit a line in 2-dim, one has a single vector x of length n with the independent variable and a corresponding vector of dependent variable y.
EXAMPLE: To fit a plane in 3-dim, one has two vectors of length n of independent variables (x1, x2). In this case, x = np.column_stack([x1, x2]).
EXAMPLE: To fit a hyperplane in 4-dim, one has three vectors of independent variables (x1, x2, x3). In this case, x = np.column_stack([x1, x2, x3]).
- y: array_like with shape (n,)
Vector of measured values for each set of x variables.
- sigx: array_like with shape (n, m)
Array of 1sigma uncertainties for each x coordinate for m dimensions. This has the same shape as x.
- sigy: array_like with shape (n,)
Vector of 1sigma uncertainties for each y value.
Optional Keywords
- clip: float
Clipping threshold in sigma units. Values deviating more than clip*sigma from the best fit are considered outliers and are excluded from the fit. Default is clip=2.6, which would include 99% of the values for a Gaussian distribution.
- corr: bool
if True, the correlation coefficients are printed on the plot. Default is True.
- epsy: bool
If True, the intrinsic scatter is printed on the output plot. Default is True.
- frac: float
Fraction of values to include in the LTS stage. Up to a fraction frac of the values can be outliers. One must have 0.5 <= frac <= 1. Default is 0.5.
NOTE: Set frac=1 to turn off outlier detection.
- pivot: array_like with shape (m,)
If nonzero, then ltsfit fits the following line, plane or hyperplane:
y = a + b0*(x0 - pivot[0]) + b1*(x1 - pivot[1]) + ...
pivot are called x_0, y_0 in eq.(7) of Cappellari et al. (2013a). Use of this keyword is strongly recommended, and suggested values are pivot = np.median(x, 0). This keyword has no effect on the best fit but is important to reduce the covariance and uncertainty in the intercept a. However, the covariance is weakly dependent on the precise value of the pivot. For this reason, it is generally better to round the pivot values to nice numbers. Default is 0.
- plot: bool
If True, a plot of the fit is produced. Default is True.
- text: bool
If True, the best fitting parameters are printed on the plot. Default is True.
Output Parameters
The output values are stored as attributes of the ltsfit class.
- p.coef: array_like with shape (m+1,)
Best fitting parameters [a, b1, b2,..., bm].
- p.coef_err: array_like with shape (m+1,)
1*sigma formal uncertainties [a_err, b1_err, b2_err,..., bm_err].
- p.mask: array_like with shape (n,) and dtype bool
Boolean vector indicating which elements of z were included in the fit (True) and which were clipped as outliers (False).
- p.rms: float
RMS deviation between the data and the fitted relation.
- p.sig_int: float
Intrinsic scatter in the y direction around the line/plane/hyperplane. sig_int is called epsilon_y in eq.(6) of Cappellari et al. (2013a).
- p.sig_int_err: float
1*sigma formal error on sig_int.
- p.xx: array_like with shape (n,)
Values plotted along the x-axis. This is the linear combination of the x variables that represents the plane/hyperplane edge-on:
xx = a + b1*(x1 - pivot[0]) + b2*(x2 - pivot[1]) + ...
For line fitting, these are just the x values.
- p.yy: array_like with shape (n,)
The input y values plotted along the y-axis.
- p.xerr: array_like with shape (n,)
1*sigma uncertainties for p.xx in the x-axis of the plot.
- p.yerr: array_like with shape (n,)
1*sigma uncertainties for p.yy in the y-axis of the plot.
- p.xline: array_like with shape (2,)
x coordinates of the best fitting relation as shown on the plot.
- p.yline: array_like with shape (2,)
y coordinates of the best fitting relation as shown on the plot.
- p.spearmanr: array_like with shape (2,)
Spearman r coefficient and probability p between (p.xx, p.yy) without clipping outliers.
- p.pearsonr: array_like with shape (2,)
Pearson r coefficient and probability p between (p.xx, p.yy) without clipping outliers.
License
Other/Proprietary License
Copyright (c) 2012-2023 Michele Cappellari
This software is provided as is with no warranty. You may use it for non-commercial purposes and modify it for personal or internal use, as long as you include this copyright and disclaimer in all copies. You may not redistribute the code.
Changelog
- V6.0.1: MC, Oxford, 20 July 2023
New function ltsfit to fit hyperplanes in N-dim. This procedure generalizes and replaces both lts_linefit and lts_planefit, which are now deprecated wrappers for ltsfit. This change was suggested and motivated by Francesco D’Eugenio (cam.ac.uk), who shared his own 4-dim lts_hyperfit and his paper on a useful application.
ltsfit: When fitting planes/hyperplanes, plot the independent variable on the y-axis to be consistent with line fitting. Also plot a legend.
Updated all ltsfit_examples.
Fixed inconsistency between the version number on PyPi and in the Changelog.
- V5.0.20: MC, Oxford, 3 October 2022
Fixed program stop due to Matplotlib change. Thanks to Hitesh Lala (Heidelberg) for the report.
Extract documentation from docstrings and show it on PyPi.
- V5.0.19: MC, Oxford, 22 January 2021
Fixed incorrect plot ranges due to a Matplotlib change. Thanks to Davide Bevacqua (unibo.it) for the report.
- V5.0.18: MC, Oxford, 17 February 2020
Properly print significant trailing zeros in results.
- V5.0.17: MC, Oxford, 22 January 2020
Formatted documentation as docstring.
Included p.rms output.
Published on PyPi. Increased major version number by mistake.
- V2.0.16: MC, Oxford, 27 September 2018
Fixed clock DeprecationWarning in Python 3.7.
- V2.0.15: MC, Oxford, 12 May 2018
Dropped Python 2.7 support.
- V2.0.14: MC, Oxford, 13 April 2018
Fixed FutureWarning in Numpy 1.14.
- V2.0.13: Michele Cappellari, Oxford, 26 July 2017
Increased upper limit of intrinsic scatter accounting for uncertainty of standard deviation with small samples.
- V2.0.12: MC, Oxford, 5 September 2016
Fixed: store ab errors in p.ab_err as documented. Thanks to Alison Crocker for the correction.
- V2.0.11: MC, Oxford, 4 July 2016
Added capsize=0 in plt.errorbar to prevent error bar caps from showing up in PDF.
- V2.0.10: MC, Oxford, 23 January 2016
Check for non finite values in input.
- V2.0.9: MC, Oxford, 8 January 2016
Use LimeGreen for outliers.
- V2.0.8: MC, Oxford, 9 December 2015
Fixed potential program stop without outliers in Matplotlib 1.5.
Increased maximum intrinsic scatter for brentq, to avoid possible stops in extreme situations.
- V2.0.7: MC, Oxford, 1 October 2015
Fixed potential program stop without outliers.
- V2.0.6: MC, Oxford, 5 September 2015
Optionally pass a legend label.
- V2.0.5: MC, Oxford, 6 July 2015
Fixed potential program stop without outliers. Thanks to Masato Onodera for a clear report of the problem.
Output boolean mask instead of good/bad indices.
Removed lts_linefit_example from this file.
Print verbose output during calculation.
- V2.0.4: MC, Baltimore, 9 June 2015
Updated documentation.
- V2.0.3: MC, Oxford, 10 December 2014
Uses np.std rather than biweight to estimate scatter upper limit.
- V2.0.2: MC, 6 November 2014
Included _linefit function to avoid np.polyfit bug with weights.
- V2.0.1: MC, Oxford, 23 October 2014
Fixed program stop with zero scatter.
- V2.0.0: MC, Portsmouth, 23 June 2014
Converted from IDL into Python.
- V1.0.6: MC, Baltimore, 8 June 2014
Check that all input vectors have the same size.
- V1.0.5: MC, Oxford, 19 September 2013
Scale line spacing with character size in text output.
- V1.0.4: MC, Turku, 10 July 2013
Fixed program stop affecting earlier versions of IDL. Thanks to Xue-Guang Zhang for reporting the problem and the solution.
- V1.0.3: MC, Oxford, 13 March 2013
Added CLIP keyword.
- V1.0.2: MC, Oxford, 1 August 2011
Added PIVOT keyword.
- V1.0.1: MC, Oxford, 28 July 2011
Included _EXTRA and OVEPLOT, keywords.
- V1.0.0: Michele Cappellari, Oxford, 21 March 2011
Created and tested.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ltsfit-6.0.2.tar.gz
.
File metadata
- Download URL: ltsfit-6.0.2.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fc929b887c0567657b0d894f9568f34795d9c18883ee89fb993a089c1b73f29 |
|
MD5 | 58910d99e35b26cf4f1b1d4f9591bc04 |
|
BLAKE2b-256 | 1a2a8aa63e9a7b9077af558a26f023d64919fd8be119738d9b7ae677d04f740d |
File details
Details for the file ltsfit-6.0.2-py3-none-any.whl
.
File metadata
- Download URL: ltsfit-6.0.2-py3-none-any.whl
- Upload date:
- Size: 22.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be1c62e6aadd8917aacac4dacfa3447e3999a96ee55bf30e5d3ef5ce618046ae |
|
MD5 | ca0cd3a0f790cae66e568b2e42356d5e |
|
BLAKE2b-256 | 8a471562e7c6490570586770142463d10067c88b8b9b68b1ad5fdee0548d1e70 |