Skip to main content

Simple linear model tools

Project description

statdstools: A Data Science package for Linear Models and Regularized Regression

dstools is a Python package developed progressively over the course of a semester in a statistical learning / regression class at the University of Iowa-Department of Statistics & Actuarial Science.
It started with basic linear regression and grew into a toolkit that includes:

  • Ordinary Least Squares (OLS) via normal equations and QR decomposition
  • Performance improvements using Cython
  • Ridge regression and cross-validation (cvridge)
  • Adaptive Elastic Net (AENet) with cross-validation over λ₁ and λ₂ (cv_aenet)
  • A vignette-style tutorial and examples on simulated data

This README serves as both:

  • A user guide for the dstools package
  • A narrative summary

1. Project Overview

The goal of dstools is to provide transparent, educational implementations of core regression tools:

  1. Basic Linear Models

    • Implemented from scratch using:
      • Normal equations
      • QR decomposition
    • Focus on understanding the math and numerical stability.
  2. Performance Optimization

    • Selected parts of the code (e.g., linear model fitting) were reimplemented in Cython to:
      • Speed up repeated computations
      • Illustrate how low-level optimization works in Python ecosystems.
  3. Ridge Regression

    • Introduced ℓ₂ regularization to handle multicollinearity and overfitting.
    • Implemented both:
      • Closed-form ridge solution
      • Cross-validation (cvridge) to select the penalty parameter λ.
  4. Adaptive Elastic Net (AENet)

    • Combined ideas from LASSO and ridge with adaptive weights.
    • Implemented via coordinate descent.
    • Used ridge regression to compute adaptive weights.
    • Implemented cross-validation over λ₁ and λ₂ (cv_aenet).
  5. Cross-Validation and Model Selection

    • Implemented K-fold cross-validation for:
      • Ridge regression
      • Adaptive elastic net
    • Produced:
      • Mean CV error surfaces
      • Upper and lower bounds (cvupper, cvlower)
      • Best tuning parameters via cvmin.
  6. Documentation and Packaging

    • Organized as a proper Python package with:
      • src/dstools/ structure
      • pyproject.toml or setup.py
      • README.md
      • docs/tutorial.md (vignette-style tutorial)
    • Designed to be installable via pip install -e ..

2. Package Structure

A typical dstools layout:

dstools/
├── pyproject.toml          # or setup.py
├── README.md               # this file
├── LICENSE                 # license file (e.g., MIT)
├── docs/
│   └── tutorial.md         # vignette-style tutorial
└── src/
    └── dstools/
        ├── __init__.py     # package initializer
        ├── mylm_qr.py      # basic linear model via QR
        ├── mylm.py         # basic linear model via normal equations (optional)
        ├── mylm_cython.pyx # Cython-accelerated linear model (optional)
        ├── ridge.py        # ridge regression + cvridge
        ├── aenet.py        # adaptive elastic net implementation
        ├── cv_aenet.py     # cross-validation for AENet (if separate)
        ├── utils.py        # helper functions (standardization, etc.)
        └── ...


Installation :
pip install -e .
import dstools
from dstools import cvridge, ridge, aenet, cv_aenet




Example usages:

1) mylm: Basic Linear Regression (Normal Equations)

import numpy as np
from dstools import mylm  # if exposed in __init__.py

X = np.array([[1, 2],
              [2, 3],
              [3, 4],
              [4, 5]], dtype=float)
y = np.array([2, 3, 4, 5], dtype=float)

fit = mylm(X, y)
print("Coefficients:", fit["beta"])
print("Fitted values:", fit["fitted"])
print("Residuals:", fit["residuals"])


2) mylm_qr: Linear Regression via QR Decomposition
from dstools import mylm_qr

fit_qr = mylm_qr(X, y)
print("Coefficients (QR):", fit_qr["beta"])

3) Cython: Speeding Up Linear Models

# mylm_cython.pyx (conceptual)
cimport cython
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
def mylm_cython(double[:, :] X, double[:] y):
    # implement normal equations or QR with typed loops
    # return coefficients, etc.
    ...
from dstools import mylm_cython

fit_fast = mylm_cython(X, y)

4) Ridge Regression and Cross-Validation

from dstools import ridge

ridge_fit = ridge(X, y, lam=1.0)
beta_ridge = ridge_fit.betas.flatten()

5) cvridge: Cross-Validation for Ridge

from dstools import cvridge
import numpy as np

lam_grid = np.logspace(-3, 6, 200)
cv = cvridge(X, y, lam_grid, K=5)

best_idx = np.argmin(cv["cv_mse"])
best_lam = cv["lam"][best_idx]

ridge_fit = ridge(X, y, best_lam)
beta_ridge = ridge_fit.betas.flatten()


6) Adaptive Elastic Net (AENet)


from dstools import aenet

lam1 = 0.1
lam2 = 0.1

fit_aenet = aenet(X, y, lam1=lam1, lam2=lam2, weights=weights)
b0 = fit_aenet["b0"]
beta_hat = fit_aenet["beta"]

7) Cross-Validation for AENet: cv_aenet

from dstools import cv_aenet
import numpy as np

lambda1_seq = np.logspace(3, -1, 30)
lambda2_seq = np.array([0.0, 0.1, 1.0])

cvfit = cv_aenet(X, y, lambda1_seq, lambda2_seq, k=5, random_state=123)

best_lam1 = cvfit["best_lambda1"]
best_lam2 = cvfit["best_lambda2"]

i = cvfit["best_lambda1_index"]
j = cvfit["best_lambda2_index"]

beta_cvmin = cvfit["full_fit"][i]["beta"]
b0_cvmin = cvfit["full_fit"][i]["b0"]

selected = np.where(np.abs(beta_cvmin) > 1e-8)[0]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statdstools-0.1.1.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statdstools-0.1.1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file statdstools-0.1.1.tar.gz.

File metadata

  • Download URL: statdstools-0.1.1.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for statdstools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6ba15e67b7ce034eb3ceeed2316d27fa5e421e41c245dbcbec24f6fce4a6c362
MD5 38c86e992d89fa510a925f8f5cec3875
BLAKE2b-256 e9ecf124cef8d22971e973460a71f1b678e393bfdf908ee5b017f9c406de8ee7

See more details on using hashes here.

File details

Details for the file statdstools-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: statdstools-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for statdstools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e8ce1cc9906675cb35576f31db7e56bbbc35e44db307837e82965e695d5ef9d6
MD5 24b8d73d07cb03468219b602796effcc
BLAKE2b-256 6e6b50ca019cf37f9f39fd72875f0504fca185f32cf189df3f4e990d96679d2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page