A Python package for sklearn to produce linear regression summary and diagnostic plots similar to those made in R with summary.lm and plot.lm
Project description
pyplotlm - R style linear regression summary and diagnostic plots for sklearn
This package is a reproduction of the summary.lm
and plot.lm
function in R but for a python environment and is meant to support the sklearn library by adding model summary and diagnostic plots for linear regression.
In the R environment, we can fit a linear model and generate a model summary and diagnostic plots by doing the following:
> fit = lm(y ~ ., data=data)
> summary(fit)
Call:
lm(formula = y ~ ., data = data)
Residuals:
Min 1Q Median 3Q Max
-155.829 -38.534 -0.227 37.806 151.355
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 152.133 2.576 59.061 < 2e-16 ***
X0 -10.012 59.749 -0.168 0.867000
X1 -239.819 61.222 -3.917 0.000104 ***
X2 519.840 66.534 7.813 4.30e-14 ***
X3 324.390 65.422 4.958 1.02e-06 ***
X4 -792.184 416.684 -1.901 0.057947 .
X5 476.746 339.035 1.406 0.160389
X6 101.045 212.533 0.475 0.634721
X7 177.064 161.476 1.097 0.273456
X8 751.279 171.902 4.370 1.56e-05 ***
X9 67.625 65.984 1.025 0.305998
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 54.15 on 431 degrees of freedom
Multiple R-squared: 0.5177, Adjusted R-squared: 0.5066
F-statistic: 46.27 on 10 and 431 DF, p-value: < 2.2e-16
> par(mfrow=c(2,2))
> plot(fit)
The goal of this package is to make this process as simple as it is in R for a sklearn LinearRegression object.
Install
pip install pyplotlm
Introduction
There are two core functionalities:
A. generate a R style regression model summary (R summary.lm)
B. plot six available diagnostic plots (R plot.lm):
1. Residuals vs Fitted
2. Normal Q-Q
3. Scale-Location
4. Cook's Distance
5. Residuals vs Leverage
6. Cook's Distance vs Leverage / (1-Leverage)
Usage
Below is how you would produce the summary and diagnostic plots in Python:
>>> from sklearn.datasets import load_diabetes
>>> from sklearn.linear_model import LinearRegression
>>> import matplotlib.pyplot as plt
>>> from pyplotlm import *
>>> X, y = load_diabetes(return_X_y=True)
>>> reg = LinearRegression().fit(X, y)
>>> obj = PyPlotLm(reg, X, y, intercept=False)
>>> obj.summary() # or summary(obj)
Residuals:
Min 1Q Median 3Q Max
-155.8290 -38.5339 -0.2269 37.8061 151.3550
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 152.1335 2.5759 59.0614 0.0000 ***
X0 -10.0122 59.7492 -0.1676 0.8670
X1 -239.8191 61.2223 -3.9172 0.0001 ***
X2 519.8398 66.5336 7.8132 0.0000 ***
X3 324.3904 65.4219 4.9584 0.0000 ***
X4 -792.1842 416.6839 -1.9012 0.0579 .
X5 476.7458 339.0345 1.4062 0.1604
X6 101.0446 212.5326 0.4754 0.6347
X7 177.0642 161.4756 1.0965 0.2735
X8 751.2793 171.9020 4.3704 0.0000 ***
X9 67.6254 65.9842 1.0249 0.3060
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 54.154 on 431 degrees of freedom
Multiple R-squared: 0.5177, Adjusted R-squared: 0.5066
F-statistic: 46.27 on 10 and 431 DF, p-value: 1.11e-16
>>> obj.plot() or plot(obj)
>>> plt.show()
This will produce the same set of diagnostic plots:
References:
-
Regression Deletion Diagnostics (R)
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/influence.measures.html
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/plot.lm -
Residuals and Influence in Regression
https://conservancy.umn.edu/handle/11299/37076
https://en.wikipedia.org/wiki/Leverage_(statistics)
https://en.wikipedia.org/wiki/Studentized_residual -
Cook's Distance
https://en.wikipedia.org/wiki/Cook%27s_distance
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyplotlm-0.1.4.tar.gz
.
File metadata
- Download URL: pyplotlm-0.1.4.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2085539c860a032878d52d63021e79baf0ebe7f2701ffc3534d2561321d0c12 |
|
MD5 | 16e288b9c4978320acfbd4a68a128a9e |
|
BLAKE2b-256 | bfc7e6d7343c25d675aec176144d964ebe54404046bf0c4bb942849aca67da96 |