Skip to main content

GLMs in Python.

Project description

CI coverage Python License PyPI

PyGLMs (Turtles) 🐢

An implementation of various Generalized Linear Models (GLMs), written in Python.

I created this package as a refresher on GLMs and the underlying optimization techniques. It's intended as a learning tool and a reference for building and understanding these models from the ground up.

Overview

The code is packaged as a Python library named turtles (I like turtles), making the code easy to integrate into your own projects.

The package is written using numpy for linear algebra operations, scipy for (some) optimization, pandas for displaying tabular results, and matplotlib for plots.

The following models have been implemented:

  1. Multiple Linear Regression (turtles.stats.glms.MLR class)
  2. Logistic Regression (turtles.stats.glms.LogReg class, uses GLM parent class)
  3. Poisson Regression (turtles.stats.glms.PoissonReg class, uses GLM parent class)

The GLM parent class supports three optimization methods for parameter estimation: Momentum-based Gradient Descent for first-order optimization, Newton's Method for second-order optimization, and Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS). The user can specify the desired optimization method during class instantiation.

Momentum-based Gradient Descent and Newton's Method are implemented in Python as part of the turtles distribution. L-BFGS is implemented using scipy.optimize; it's a quasi-Newton method that approximates the Hessian (instead of fully computing it, like Newton's Method), so it's quite fast.

Usage

You can pip install the package from PyPI:

pip install turtles-glms

See examples/ in the GitHub repo for example usage of the GLM classes and statistical functions.

Fitting GLMs

You can fit GLMs by instantiating a GLM class and calling its fit() method.

model = PoissonReg(
    method="newton",
    learning_rate=0.01
)
n_model.fit(
    X=X, 
    y=y, 
    exposure=exposure
)

A few important notes about fitting turtles GLMs:

  1. The fit() method parameters X, y, and (for Poisson) exposure must be numpy arrays. Parameters y and exposure must be of shape (M, 1), where M is the number of rows in the data. The package does not support pandas or polars dataframes at this time. See class / instance method docstrings for exact requirements.
  2. Each GLM class has a learning_rate parameter, applicable to Gradient Descent and Newton's optimization methods. The learning rate (or step size) is a hyperparameter that controls the magnitude of parameter updates during the optimization process. If it's too large, the Hessian matrix may become singular, in which case the learning rate should be decreased. This is typically part of the tuning process.
  3. There are currently no regularization methods implemented in the package. Future versions may include L1, L2, and Elastic Net methods.

Contributing

To run (and edit) this project locally, clone the repo and create your virtual environment from project root using your global (or local) Python version. This project requires Python 3.10+.

python -m venv

Activate the env (source .venv/Scripts/activate for Windows OS, source .venv/bin/activate for Linux) and install dependencies:

pip install -e .[dev]

Optionally, you can execute scripts/env.sh to create and activate a virtual environment using uv. The uv package manager must be installed for this to work.

Adding GLMs

To add more GLM classes, use the GLM parent class for inheritence (see PoissonReg and LogReg as examples). The GLM parent class provides a solid framework for implementing new child classes and should be used whenever possible. Unimplemented GLMs include Negative Binomial, Gamma, and Tweedie.

Testing

All tests are contained within tests directories for each module. You can simply execute the pytest command from project root to run all unit tests.

pytest

Notes on Test Coverage:

  • Plotting functions from turtles.plotting are tested, but plotting methods in GLM classes (like MLR) are ignored. Those class methods are essentially just wrappers around matplotlib and turtles.plotting functions.
  • GLM class methods that are meant to be implemented by child classes are ignored.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turtles_glms-1.1.0.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turtles_glms-1.1.0-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file turtles_glms-1.1.0.tar.gz.

File metadata

  • Download URL: turtles_glms-1.1.0.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for turtles_glms-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f13ea467c81d0489dd0958a7b8b475526168cfe77ddecd58e9a7cd1f9f22332a
MD5 e7b59c53c1a503185e410db2dfc423a0
BLAKE2b-256 a884507c45aa1376769f0db65526afa6b5fa155e09f15c130e2bdf0620c9bf65

See more details on using hashes here.

File details

Details for the file turtles_glms-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: turtles_glms-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for turtles_glms-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47f87b60ce5a148a0b561d770cbdd4f10492fd9f41e712611c64b7967e8312ad
MD5 f4fd58d9ba6333f6ee96b60dfd6c7d10
BLAKE2b-256 98cffb94c75b7ee30f7ba30d740893ac43721f37966ddf8f9af08032e3f93ad1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page