Skip to main content

A linear regression package with statistical testing for each estimator.

Project description

tidylinreg

Python 3.13 Documentation Status ci-cd codecov Repo Status PyPI

This package provides tools for linear regression in Python, with a similar style to the lm and summary functions in R.

Installation

You can install this package by running the following command in your terminal:

$ pip install tidylinreg

Summary

The tidylinreg package fits a linear model to a dataset, and can be used to carry out regression. tidylinreg computes and returns a list of summary statistics of the fitted linear model, including standard error, confidence intervals, and p-values. These summary statistics are ouput as a Pandas DataFrame. This is advantageous as it allows for fast and convenient manipulation of large regression models, where, for example, insignificant parameters can easily be filtered out!

Functions

tidylinreg is built around the LinearModel object, which offers three useful methods:

  • fit:
    • Fits the linear model to the provided regressors and response. This is the first step in using the LinearModel object; the object must be fitted to the data before anything else!
    • Please be advised that at the current state of development, fit only accepts continuous regressors. If your data is categorical, first transforming into dummy variables with encoding techniques, such as One-Hot Encoding
    • Watch out for collinearity! tidlinreg will let you know if there is any linear dependence in your data before fitting. provided by Scikit-Learn.
    • For convenience, the intercept is automatically included into the regression model. No need to modify your data to accomodate this!
  • predict:
    • Predict the response using given test regressor data. Remember to fit the model first!
  • summary:
    • Provides a summary of the model fit, similar to the output of the R summary() function when computed on a fitted lm object.
    • The output includes parameter names, estimates, standard errors, test statistics, and significance p-values as a Pandas DataFrame
    • Additionally, the user can choose to include confidence interval estimates of their parameters, and can specify the significance level.

The user can access specific aspects of the summary function using get_std_error, get_test_statistic, get_ci, and get_pvalues. However, we reccommend using summary to access these estimates.

Documentation

Detailed documentation for tidylinreg can be found here.

Using tidylinreg

Once tidylinreg is installed, you can import the LinearModel object to begin your regression analysis!

  1. Fitting the model

    Before anything else, we need to fit the model to our data:

    from tidylinreg.tidylinreg import LinearModel
    import pandas as pd
    
    training_data = pd.read_csv('path/to/your/training_data.csv')
    X_train = training_data.drop(columns='response')
    y_train = training_data['response']
    
    my_linear_model = LinearModel()
    my_linear_model.fit(X_train,y_train)
    

    NOTE: An intercept term is automatically included in the linear model when fit is called. No need to pad your data with a column of ones! tidylinreg does this for you.

  2. Summary Statistics

    Once the regression parameters are estimated, we can summarize their errors and significance using the summary method:

    my_linear_model.summary()
    

    By default, the confidence intervals will not be included. We can change this by setting the ci argument to True:

    my_linear_model.summary(ci=True)
    

    The default significance level is 0.05, giving 95% confidence intervals. We can change this by modifying the alpha arguument. For example, if we want wider 99% confidence intervals, we can set alpha to 0.01:

    my_linear_model.summary(ci=True, alpha=0.01)
    
  3. Make Predictions

    Now we can make predictions using the predict method! Lets suppose we have a subset of our data allocated as test data. To make predictions, we can do the following:

    testing_data = pd.read_csv('path/to/your/testing_data.csv')
    X_test = testing_data.drop(columns='response')
    
    linear_model.predict(X_test)
    

Testing tidylinreg

To test the tidylinreg package, you will need to install pytest in your python environment:

$ pip install pytest

Then, git clone this repository and navigate to the root directory. Execute the following command in your terminal:

$ pytest

Python Ecosystem

There are existing models for linear regression in Python, such as Ridge from the sklearn package. The tidylinreg package provides similar fit and predict functionality, with the added functionality to compute statistical metrics about the linear model, including standard error, confidence intervals, and p-values. Similar to tidylinreg, statsmodels is a package that can perform statistical tests on different types of models, including ordinary least squares. The advantage of tidylinreg is the usage of Pandas Dataframes as an output, which assists in optimizing workflows and inference.

Contributing

Interested in contributing? Check out the Contributing Guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

tidylinreg was created by Benjamin Frizzell, Danish Karlin Isa, Nicholas Varabioff, Yasmin Hassan. It is licensed under the terms of the MIT license, which can be viewed here.

Credits

tidylinreg was created with cookiecutter and the py-pkgs-cookiecutter template.

References

Contributors

  • Benjamin Frizzell
  • Danish Karlin Isa
  • Nicholas Varabioff
  • Yasmin Hassan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidylinreg-1.1.5.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tidylinreg-1.1.5-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file tidylinreg-1.1.5.tar.gz.

File metadata

  • Download URL: tidylinreg-1.1.5.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tidylinreg-1.1.5.tar.gz
Algorithm Hash digest
SHA256 fd615d35a14f4888daaec013037737f077bf89113f22f89fdc4f1d7709b0d9c5
MD5 e3f367cf4f9692b15d68b2186a4b1cba
BLAKE2b-256 76fe393676b0836c947d0faa12a9e75a0a86a4cf379b1556278f4e6db96c2c47

See more details on using hashes here.

File details

Details for the file tidylinreg-1.1.5-py3-none-any.whl.

File metadata

  • Download URL: tidylinreg-1.1.5-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for tidylinreg-1.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3c1f6e1ad702827c837423ae2f2eb39dfcf89714cfb3568c3560ecc1e24f65a5
MD5 68c8c28fedc5b00297aace7f908d8df4
BLAKE2b-256 9a2afceca955ce4e0f56688d681304540cc1a66ad3cf6eb44473b42bb4648c36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page