Skip to main content

Regression inference for Python

Project description

regression-inference

PyPI version License

pip install regression-inference

Python packaged designed for statistical inference in machine learning, econometrics, and research, with support for hardware accelerated training.


Features

  • Linear Regression, Logistic Regression, Multinomial Logistic Regression, and Ordinal Logistic Regression model fitting.

  • Full inference statistics for model predictions including standard errors, t/z, significance, and confidence ranges.

  • Support for accelerated model training.

  • Modular regression tables.


Dependencies

numpy and scipy are required as dependencies in regression-inference

numpy>=2.0.0
scipy>=1.15.0

Using pandas is optional but recommended.


Hardware Acceleration with CUDA

CUDA acceleration can be used for model training on supported hardware.

CuPy is a required dependency for hardware acceleration, it is not installed with regression-inference by default. Additionally, the CUDA Toolkit is a required dependency unless cupy is installed from conda-forge.

Fit models with the parameter cuda = True to enable hardware GPU acceleration.

model = OrdinalLogisticRegression().fit(X, y, cuda = True)
model = MultinomialLogisticRegression().fit(X, y, cuda = True)

Documentation / How To

See the provided notebooks on GitHub for example workflows.

/tests/notebooks/linear_regression_example.ipynb

/tests/notebooks/logit_regression_example.ipynb

/tests/notebooks/multinomial_regression_example.ipynb

/tests/notebooks/ordinal_regression_example.ipynb

Classes and Functions

from regression_inference import LinearRegression, LogisticRegression, MultinomialLogisticRegression, OrdinalLogisticRegression, summary

Ordinary Least Squares Regression Output

==================================================
OLS Regression Results
--------------------------------------------------
Dependent:                     educ     robust_edu
--------------------------------------------------
 
const                     7.3256***      7.3256***
                           (0.3684)       (0.4345)
 
paeduc                    0.2144***      0.2144***
                           (0.0241)       (0.0236)
 
maeduc                    0.2569***      0.2569***
                           (0.0271)       (0.0294)
 
age                       0.0241***      0.0241***
                           (0.0043)       (0.0042)

--------------------------------------------------
R-squared                     0.276          0.276
Adjusted R-squared            0.274          0.274
F Statistic                 177.548        177.548
Observations               1402.000       1402.000
Log Likelihood            -3359.107      -3359.107
AIC                        6726.213       6726.213
BIC                        6747.196       6747.196
TSS                       13663.270      13663.270
RSS                        9893.727       9893.727
ESS                        3769.543       3769.543
MSE                           7.077          7.077
==================================================
*p<0.1; **p<0.05; ***p<0.01

Logistic Regression Summary

===================================
Logistic Regression Results
-----------------------------------
Dependent:                    GRADE
-----------------------------------
 
const                    -13.0213**
                           (5.1976)
 
GPA                        2.8261**
                           (1.2675)
 
TUCE                         0.0952
                           (0.1179)
 
PSI                        2.3787**
                           (0.9644)

-----------------------------------
Pseudo R-squared              0.374
LR Statistic                 15.404
Observations                 32.000
Log Likelihood              -12.890
Deviance                     25.779
Null Deviance                41.183
AIC                          33.779
BIC                          39.642
===================================
*p<0.1; **p<0.05; ***p<0.01

Multinomial Logit Summary

=============================================
Multinomial Regression Results
---------------------------------------------
Dependent:                                PID
---------------------------------------------
Class:                                      1

const                                 -0.3734
                                     (0.5943)
 
logpopul                              -0.0115
                                     (0.0341)
 
selfLR                              0.2977***
                                     (0.0993)
 
age                                -0.0249***
                                     (0.0061)
 
educ                                   0.0825
                                     (0.0740)
 
income                                 0.0052
                                     (0.0168)
 
---------------------------------------------
Class:                                      2

const                              -2.2509***
                                     (0.7579)
 
logpopul                            -0.0888**
                                     (0.0377)
 
selfLR                              0.3917***
                                     (0.1089)
 
age                                -0.0229***
                                     (0.0084)
 
educ                                 0.1810**
                                     (0.0862)
 
income                               0.0479**
                                     (0.0234)
 
---------------------------------------------
Class:                                      3

const                              -3.6656***
                                     (1.3816)
 
logpopul                              -0.1060
                                     (0.0659)
 
selfLR                              0.5735***
                                     (0.1648)
 
age                                   -0.0149
                                     (0.0107)
 
educ                                  -0.0072
                                     (0.1234)
 
income                                 0.0576
                                     (0.0390)
 
---------------------------------------------
Class:                                      4

const                              -7.6138***
                                     (1.0433)
 
logpopul                            -0.0916**
                                     (0.0452)
 
selfLR                              1.2788***
                                     (0.1382)
 
age                                   -0.0087
                                     (0.0086)
 
educ                                 0.1998**
                                     (0.0966)
 
income                              0.0845***
                                     (0.0262)
 
---------------------------------------------
Class:                                      5

const                              -7.0605***
                                     (0.8462)
 
logpopul                            -0.0933**
                                     (0.0399)
 
selfLR                              1.3470***
                                     (0.1252)
 
age                                 -0.0179**
                                     (0.0078)
 
educ                                0.2169***
                                     (0.0816)
 
income                              0.0810***
                                     (0.0219)
 
---------------------------------------------
Class:                                      6

const                             -12.1058***
                                     (1.2198)
 
logpopul                           -0.1409***
                                     (0.0427)
 
selfLR                              2.0701***
                                     (0.1747)
 
age                                   -0.0094
                                     (0.0084)
 
educ                                0.3219***
                                     (0.0879)
 
income                              0.1089***
                                     (0.0260)
 
---------------------------------------------
Accuracy                                0.394
Pseudo R-squared                        0.165
LR Statistic                          576.848
Observations                          944.000
Log Likelihood                      -1461.923
Null Log Likelihood                 -1750.347
Deviance                             2923.845
Null Deviance                        3500.693
AIC                                  2995.845
BIC                                  3170.450
=============================================
*p<0.1; **p<0.05; ***p<0.01

Ordinal Regression Summary

=============================================
Ordinal Regression Results
---------------------------------------------
Dependent:                                PID
---------------------------------------------
 
logpopul                           -0.0707***
                                     (0.0191)
 
selfLR                              1.0192***
                                     (0.0533)
 
age                                   -0.0042
                                     (0.0037)
 
educ                                0.1777***
                                     (0.0408)
 
income                              0.0472***
                                     (0.0108)
 
0:1                                 3.6891***
                                     (0.3729)
 
1:2                                 0.2243***
                                     (0.0686)
 
2:3                                -0.3445***
                                     (0.0910)
 
3:4                                -1.3565***
                                     (0.1607)
 
4:5                                -0.4247***
                                     (0.0979)
 
5:6                                  0.1653**
                                     (0.0752)

---------------------------------------------
Accuracy                                0.388
Pseudo R-squared                        0.146
LR Statistic                          511.454
Observations                          944.000
Log Likelihood                      -1494.620
Null Log Likelihood                 -1750.347
Deviance                             2989.239
Null Deviance                        3500.693
AIC                                  3011.239
BIC                                  3064.590
=============================================
*p<0.1; **p<0.05; ***p<0.01

Coefficient Inference Table

Inference tables can be generated for the model features.

pd.DataFrame(model.inference_table())


Predictions

Extract the order of feature names using feature_names[:1]

model.feature_names[1:]
[Out]: Index(['paeduc', 'maeduc', 'age'], dtype='object')

Predict in the order of the features without a constant.

model.predict(np.array([0, 0, 0], ))
[Out]: 7.32564767

Inference

Use return_table = True to include inference statistics.

# Range over values of a feature

prediction_set = [
    (np.array([i, X['maeduc'].mean(), X['age'].mean()],))
    for i in range(int(X['paeduc'].min()), int(X['paeduc'].max())+1)
    ] 
    
predictions = pd.concat(pd.DataFrame(model.predict(i, return_table=True)) for i in prediction_set)

# Predict discrete values

prediction_set = [
    np.array([2.66, 20.0, 0.0]),
    np.array([2.89, 22.0, 0.0]),
    np.array([3.28, 24.0, 0.0]),
    np.array([2.92, 12.0, 0.0]),
]

predictions = pd.concat(pd.DataFrame(model.predict(test_set, return_table=True)) for test_set in prediction_set)


Variance Inflation Factor

Variance Inflation Factor table can be generated for the model features.

model.variance_inflation_factor()
{'feature': Index(['paeduc', 'maeduc', 'age'], dtype='object'),
 'VIF': array([2.0233, 2.0285, 1.0971])}

Heteroskedastic-Robust Standard Errors

Set the covariance matrix on fit using cov_type:

model = MultinomialLogisticRegression().fit(X, y, cov_type="HC0")

model = LogisticRegression().fit(X, y, cov_type="HC1")

model = LinearRegression().fit(X, y, cov_type="HC3")

Preview robust covariance without setting:

model.robust_se(type="HC3")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regression_inference-1.5.5.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

regression_inference-1.5.5-py3-none-any.whl (40.8 kB view details)

Uploaded Python 3

File details

Details for the file regression_inference-1.5.5.tar.gz.

File metadata

  • Download URL: regression_inference-1.5.5.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for regression_inference-1.5.5.tar.gz
Algorithm Hash digest
SHA256 2c47c9b5fd70bcd2346c1169b8598d3a69619260f0cede0e07c291e202c5b1a6
MD5 dc042995e58f02dccd9a4ebff2ae3811
BLAKE2b-256 5177d3cfa4006783756a2eb804fd5276c313ad66296ab79cd4929addd0c749ab

See more details on using hashes here.

File details

Details for the file regression_inference-1.5.5-py3-none-any.whl.

File metadata

File hashes

Hashes for regression_inference-1.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f42208e8ed0386d21931e4eb3b7cd0ec5f16310eed7c90cc4c89030a34ceff81
MD5 bd4e725d67b790398e43f47e798a3225
BLAKE2b-256 ba3b1cbfaeb73e5bf7b8c67d42fcc15f835d88b341f1a71f2abeaeb735a2d43e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page