Simple linear model tools
Project description
statdstools: A Data Science package for Linear Models and Regularized Regression
dstools is a Python package developed progressively over the course of a semester in a statistical learning / regression class at the University of Iowa-Department of Statistics & Actuarial Science.
It started with basic linear regression and grew into a toolkit that includes:
- Ordinary Least Squares (OLS) via normal equations and QR decomposition
- Performance improvements using Cython
- Ridge regression and cross-validation (
cvridge) - Adaptive Elastic Net (AENet) with cross-validation over λ₁ and λ₂ (
cv_aenet) - A vignette-style tutorial and examples on simulated data
This README serves as both:
- A user guide for the
dstoolspackage - A narrative summary
1. Project Overview
The goal of dstools is to provide transparent, educational implementations of core regression tools:
-
Basic Linear Models
- Implemented from scratch using:
- Normal equations
- QR decomposition
- Focus on understanding the math and numerical stability.
- Implemented from scratch using:
-
Performance Optimization
- Selected parts of the code (e.g., linear model fitting) were reimplemented in Cython to:
- Speed up repeated computations
- Illustrate how low-level optimization works in Python ecosystems.
- Selected parts of the code (e.g., linear model fitting) were reimplemented in Cython to:
-
Ridge Regression
- Introduced ℓ₂ regularization to handle multicollinearity and overfitting.
- Implemented both:
- Closed-form ridge solution
- Cross-validation (
cvridge) to select the penalty parameter λ.
-
Adaptive Elastic Net (AENet)
- Combined ideas from LASSO and ridge with adaptive weights.
- Implemented via coordinate descent.
- Used ridge regression to compute adaptive weights.
- Implemented cross-validation over λ₁ and λ₂ (
cv_aenet).
-
Cross-Validation and Model Selection
- Implemented K-fold cross-validation for:
- Ridge regression
- Adaptive elastic net
- Produced:
- Mean CV error surfaces
- Upper and lower bounds (cvupper, cvlower)
- Best tuning parameters via cvmin.
- Implemented K-fold cross-validation for:
-
Documentation and Packaging
- Organized as a proper Python package with:
src/dstools/structurepyproject.tomlorsetup.pyREADME.mddocs/tutorial.md(vignette-style tutorial)
- Designed to be installable via
pip install -e ..
- Organized as a proper Python package with:
2. Package Structure
A typical dstools layout:
dstools/
├── pyproject.toml # or setup.py
├── README.md # this file
├── LICENSE # license file (e.g., MIT)
├── docs/
│ └── tutorial.md # vignette-style tutorial
└── src/
└── dstools/
├── __init__.py # package initializer
├── mylm_qr.py # basic linear model via QR
├── mylm.py # basic linear model via normal equations (optional)
├── mylm_cython.pyx # Cython-accelerated linear model (optional)
├── ridge.py # ridge regression + cvridge
├── aenet.py # adaptive elastic net implementation
├── cv_aenet.py # cross-validation for AENet (if separate)
├── utils.py # helper functions (standardization, etc.)
└── ...
Installation :
pip install -e .
import dstools
from dstools import cvridge, ridge, aenet, cv_aenet
Example usages:
1) mylm: Basic Linear Regression (Normal Equations)
import numpy as np
from dstools import mylm # if exposed in __init__.py
X = np.array([[1, 2],
[2, 3],
[3, 4],
[4, 5]], dtype=float)
y = np.array([2, 3, 4, 5], dtype=float)
fit = mylm(X, y)
print("Coefficients:", fit["beta"])
print("Fitted values:", fit["fitted"])
print("Residuals:", fit["residuals"])
2) mylm_qr: Linear Regression via QR Decomposition
from dstools import mylm_qr
fit_qr = mylm_qr(X, y)
print("Coefficients (QR):", fit_qr["beta"])
3) Cython: Speeding Up Linear Models
# mylm_cython.pyx (conceptual)
cimport cython
import numpy as np
cimport numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
def mylm_cython(double[:, :] X, double[:] y):
# implement normal equations or QR with typed loops
# return coefficients, etc.
...
from dstools import mylm_cython
fit_fast = mylm_cython(X, y)
4) Ridge Regression and Cross-Validation
from dstools import ridge
ridge_fit = ridge(X, y, lam=1.0)
beta_ridge = ridge_fit.betas.flatten()
5) cvridge: Cross-Validation for Ridge
from dstools import cvridge
import numpy as np
lam_grid = np.logspace(-3, 6, 200)
cv = cvridge(X, y, lam_grid, K=5)
best_idx = np.argmin(cv["cv_mse"])
best_lam = cv["lam"][best_idx]
ridge_fit = ridge(X, y, best_lam)
beta_ridge = ridge_fit.betas.flatten()
6) Adaptive Elastic Net (AENet)
from dstools import aenet
lam1 = 0.1
lam2 = 0.1
fit_aenet = aenet(X, y, lam1=lam1, lam2=lam2, weights=weights)
b0 = fit_aenet["b0"]
beta_hat = fit_aenet["beta"]
7) Cross-Validation for AENet: cv_aenet
from dstools import cv_aenet
import numpy as np
lambda1_seq = np.logspace(3, -1, 30)
lambda2_seq = np.array([0.0, 0.1, 1.0])
cvfit = cv_aenet(X, y, lambda1_seq, lambda2_seq, k=5, random_state=123)
best_lam1 = cvfit["best_lambda1"]
best_lam2 = cvfit["best_lambda2"]
i = cvfit["best_lambda1_index"]
j = cvfit["best_lambda2_index"]
beta_cvmin = cvfit["full_fit"][i]["beta"]
b0_cvmin = cvfit["full_fit"][i]["b0"]
selected = np.where(np.abs(beta_cvmin) > 1e-8)[0]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file statdstools-0.1.1.tar.gz.
File metadata
- Download URL: statdstools-0.1.1.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ba15e67b7ce034eb3ceeed2316d27fa5e421e41c245dbcbec24f6fce4a6c362
|
|
| MD5 |
38c86e992d89fa510a925f8f5cec3875
|
|
| BLAKE2b-256 |
e9ecf124cef8d22971e973460a71f1b678e393bfdf908ee5b017f9c406de8ee7
|
File details
Details for the file statdstools-0.1.1-py3-none-any.whl.
File metadata
- Download URL: statdstools-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8ce1cc9906675cb35576f31db7e56bbbc35e44db307837e82965e695d5ef9d6
|
|
| MD5 |
24b8d73d07cb03468219b602796effcc
|
|
| BLAKE2b-256 |
6e6b50ca019cf37f9f39fd72875f0504fca185f32cf189df3f4e990d96679d2a
|