Skip to main content

Lightweight low-rank+diag GLS/SUR via ALS with EM baseline

Project description

A Lightweight ALS Solver for Iterative GLS

When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a handful of latent factors plus equation‑specific noise. This “low‑rank + diagonal” assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where k (the latent factor rank) is much smaller than $K$. The model alone, however, does not guarantee speed: we still have to fit the parameters.

Installation

Install the library from PyPI:

pip install alsgls

For local development, clone the repo and use an editable install:

pip install -e .

Usage

from alsgls import als_gls, simulate_sur, nll_per_row, XB_from_Blist

Xs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)
B, F, D, mem, _ = als_gls(Xs_tr, Y_tr, k=4)
Yhat_te = XB_from_Blist(Xs_te, B)
nll = nll_per_row(Y_te - Yhat_te, F, D)

See examples/compare_als_vs_em.py for a complete ALS versus EM comparison.

Documentation and notebooks

Background material and reproducible experiments are available in the notebooks under als_sim/, such as als_sim/als_comparison.ipynb and als_sim/als_sur.ipynb.

### Solving low‑rank GLS: EM versus ALS

The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EM’s M‑step recreates the full $K × K$ inverse, wiping out the memory win.

An alternative is Alternating‑Least‑Squares (ALS). The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(K k)$ memory, while EM allocates $O(K^²)$.

Rule of thumb: if your GLS routine keeps looping between $\beta$ and a fresh $\hat{\Sigma}$, replacing the $\hat{\Sigma}$‑update by a factor‑ALS step yields the same statistical fit with an order‑of‑magnitude smaller memory footprint.

Beyond SUR: where the idea travels

Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.

A concrete case‑study: Seemingly‑Unrelated Regressions

To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50, 80, 120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALS’s largest is the skinny factor matrix F. The table summarises six replications:

K β‑RMSE EM β‑RMSE ALS Peak MB EM Peak MB ALS Memory ratio
50  0.021   0.021   0.020   0.002   10× 
80  0.020   0.020   0.051   0.003   17× 
120  0.020   0.020   0.115   0.004   29× 

Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.

5  Choosing a solver in practice

For small systems ($K < 50$), dense GLS or even separate OLS is fine. Between 50 and 300 equations, a low‑rank factor‑ALS solver gives the same estimates at roughly one‑tenth the memory and runs happily on a GPU. Once K enters the hundreds, any dense inverse becomes prohibitive; structured approaches such as factor‑ALS or sparse/banded $\hat{\Sigma}$ are mandatory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alsgls-0.1.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alsgls-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file alsgls-0.1.0.tar.gz.

File metadata

  • Download URL: alsgls-0.1.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for alsgls-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5480e2b2c32a4ad46e5d4c1a62f78a28184f0a031634d589b0481cf33700f01a
MD5 eef9fc44865e94661292619847abc33f
BLAKE2b-256 e8b8f569eca62873a507c1e758299e48ea5a99aaa3315c7a0c94990f59bc58cd

See more details on using hashes here.

File details

Details for the file alsgls-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: alsgls-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for alsgls-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3fb3b399b7f4585833476f0b46b6e4989ac6b6805ea4172e61cc7149b8caa37e
MD5 de909ebbc997c498fa3c94d093faf8e1
BLAKE2b-256 373258cea9dfc812617131aa15f2fc53f0bfa03812d86a1373931415c94d5c6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page