Lightweight low-rank+diag GLS/SUR via ALS with EM baseline
Project description
A Lightweight ALS Solver for Iterative GLS
When a GLS problem involves hundreds of equations, the $K × K$ covariance matrix becomes the computational bottleneck. A simple statistical remedy is to assume that most of the cross‑equation dependence can be captured by a handful of latent factors plus equation‑specific noise. This “low‑rank + diagonal” assumption slashes the number of unknowns from roughly $K^²$ to about $K×k$ parameters, where k (the latent factor rank) is much smaller than $K$. The model alone, however, does not guarantee speed: we still have to fit the parameters.
Installation
Install the library from PyPI:
pip install alsgls
For local development, clone the repo and use an editable install:
pip install -e .
Usage
from alsgls import als_gls, simulate_sur, nll_per_row, XB_from_Blist
Xs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)
B, F, D, mem, _ = als_gls(Xs_tr, Y_tr, k=4)
Yhat_te = XB_from_Blist(Xs_te, B)
nll = nll_per_row(Y_te - Yhat_te, F, D)
See examples/compare_als_vs_em.py for a complete ALS versus EM comparison.
Documentation and notebooks
Background material and reproducible experiments are available in the notebooks under als_sim/, such as als_sim/als_comparison.ipynb and als_sim/als_sur.ipynb.
### Solving low‑rank GLS: EM versus ALS
The classic EM algorithm alternates between updating the regression coefficients $\beta$ and updating the factor loadings $F$ and the diagonal noise $D$. Even though $\hat{\Sigma}$ is low‑rank, EM’s M‑step recreates the full $K × K$ inverse, wiping out the memory win.
An alternative is Alternating‑Least‑Squares (ALS). The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β‑update can be written without explicitly forming the dense matrix at all. In practice, ALS converges in 5–6 sweeps and never allocates more than $O(K k)$ memory, while EM allocates $O(K^²)$.
Rule of thumb: if your GLS routine keeps looping between $\beta$ and a fresh $\hat{\Sigma}$, replacing the $\hat{\Sigma}$‑update by a factor‑ALS step yields the same statistical fit with an order‑of‑magnitude smaller memory footprint.
Beyond SUR: where the idea travels
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
A concrete case‑study: Seemingly‑Unrelated Regressions
To show the magnitude, we ran a Monte‑Carlo experiment with N = 300 observations, three regressors, rank‑3 factors, and K set to 50, 80, 120. EM was given 45 iterations; ALS, six sweeps. The largest array EM holds is the dense Σ⁻¹, whereas ALS’s largest is the skinny factor matrix F. The table summarises six replications:
| K | β‑RMSE EM | β‑RMSE ALS | Peak MB EM | Peak MB ALS | Memory ratio |
|---|---|---|---|---|---|
| 50 | 0.021 | 0.021 | 0.020 | 0.002 | 10× |
| 80 | 0.020 | 0.020 | 0.051 | 0.003 | 17× |
| 120 | 0.020 | 0.020 | 0.115 | 0.004 | 29× |
Statistically, the two estimators are indistinguishable (paired‑test p ≥ 0.14). Computationally, ALS needs only a few megabytes whereas EM needs tens to hundreds.
5 Choosing a solver in practice
For small systems ($K < 50$), dense GLS or even separate OLS is fine. Between 50 and 300 equations, a low‑rank factor‑ALS solver gives the same estimates at roughly one‑tenth the memory and runs happily on a GPU. Once K enters the hundreds, any dense inverse becomes prohibitive; structured approaches such as factor‑ALS or sparse/banded $\hat{\Sigma}$ are mandatory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alsgls-0.1.0.tar.gz.
File metadata
- Download URL: alsgls-0.1.0.tar.gz
- Upload date:
- Size: 13.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5480e2b2c32a4ad46e5d4c1a62f78a28184f0a031634d589b0481cf33700f01a
|
|
| MD5 |
eef9fc44865e94661292619847abc33f
|
|
| BLAKE2b-256 |
e8b8f569eca62873a507c1e758299e48ea5a99aaa3315c7a0c94990f59bc58cd
|
File details
Details for the file alsgls-0.1.0-py3-none-any.whl.
File metadata
- Download URL: alsgls-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fb3b399b7f4585833476f0b46b6e4989ac6b6805ea4172e61cc7149b8caa37e
|
|
| MD5 |
de909ebbc997c498fa3c94d093faf8e1
|
|
| BLAKE2b-256 |
373258cea9dfc812617131aa15f2fc53f0bfa03812d86a1373931415c94d5c6d
|