GLS estimator with learned correlation and variance structures (Python equivalent of R's nlme::gls)
Project description
python-gls
GLS with learned correlation and variance structures for Python.
The missing Python equivalent of R's nlme::gls(). Unlike statsmodels.GLS (which requires you to supply a pre-computed covariance matrix), python-gls estimates the correlation and variance structure from your data via maximum likelihood (ML) or restricted maximum likelihood (REML) — exactly like R's nlme::gls().
Why this library?
If you work with panel data, repeated measures, longitudinal studies, or clustered observations, your errors are probably correlated and possibly heteroscedastic. Ignoring this gives you wrong standard errors and misleading p-values.
R has had nlme::gls() for 25+ years. Python hasn't had an equivalent. Until now.
| Feature | statsmodels.GLS |
python-gls |
R nlme::gls |
|---|---|---|---|
| Estimate correlation from data | No (manual Omega) | Yes | Yes |
| AR(1), compound symmetry, etc. | No | Yes (11 structures) | Yes |
| Heteroscedastic variance models | No | Yes (6 functions) | Yes |
| ML / REML estimation | No | Yes | Yes |
| R-style formulas | No | Yes | Yes |
Installation
pip install python-gls
Or from source:
git clone https://github.com/brunoabrahao/python-gls.git
cd python-gls
pip install -e ".[dev]"
Quick Start
from python_gls import GLS
from python_gls.correlation import CorAR1
from python_gls.variance import VarIdent
result = GLS.from_formula(
"response ~ treatment + time",
data=df,
correlation=CorAR1(), # Learn AR(1) correlation
variance=VarIdent("group"), # Learn group-specific variances
groups="subject", # Define independent clusters
).fit()
print(result.summary())
print(f"Estimated AR(1) phi: {result.correlation_params[0]:.3f}")
Output:
==============================================================================
Generalized Least Squares Results
==============================================================================
Method: REML Log-Likelihood: -615.0544
No. Observations:500 AIC: 1240.1088
Df Model: 2 BIC: 1261.1818
Df Residuals: 497 Sigma^2: 0.984576
Converged: Yes Iterations: 6
------------------------------------------------------------------------------
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 1.0368 0.1069 9.7013 0.0000 0.8268 1.2468
treatment 0.6465 0.1428 4.5272 0.0000 0.3659 0.9271
x 1.9734 0.0323 61.0960 0.0000 1.9099 2.0368
==============================================================================
Correlation Structure: CorAR1
Parameters: [0.61312872]
Correlation Structures
All correlation structures are in python_gls.correlation.
Temporal / Serial Correlation
| Class | R Equivalent | Parameters | Description |
|---|---|---|---|
CorAR1(phi=None) |
corAR1() |
1 | First-order autoregressive. R[i,j] = phi^|i-j| |
CorARMA(p, q) |
corARMA(p, q) |
p + q | ARMA(p,q) autocorrelation |
CorCAR1(phi=None) |
corCAR1() |
1 | Continuous-time AR(1) for irregular spacing |
CorCompSymm(rho=None) |
corCompSymm() |
1 | Exchangeable / compound symmetry. All pairs equal rho |
CorSymm(dim=None) |
corSymm() |
d(d-1)/2 | General unstructured. Free correlation for every pair |
Spatial Correlation
| Class | R Equivalent | Parameters | Description |
|---|---|---|---|
CorExp(range_param, nugget=False) |
corExp() |
1-2 | Exponential: exp(-d/range) |
CorGaus(range_param, nugget=False) |
corGaus() |
1-2 | Gaussian: exp(-(d/range)^2) |
CorLin(range_param, nugget=False) |
corLin() |
1-2 | Linear: max(1 - d/range, 0) |
CorRatio(range_param, nugget=False) |
corRatio() |
1-2 | Rational quadratic: 1/(1 + (d/range)^2) |
CorSpher(range_param, nugget=False) |
corSpher() |
1-2 | Spherical: cubic polynomial, zero beyond range |
All spatial structures accept an optional nugget=True parameter for a discontinuity at distance zero.
Usage
from python_gls.correlation import CorAR1, CorSymm, CorExp
# Serial: AR(1) with optional initial value
cor = CorAR1(phi=0.5)
# Unstructured: all pairs free
cor = CorSymm() # dimension inferred from data
# Spatial: set coordinates per group
cor = CorExp(range_param=10.0, nugget=True)
cor.set_coordinates(group_id=0, coords=np.array([[0,0], [1,0], [0,1]]))
Variance Functions
All variance functions are in python_gls.variance.
| Class | R Equivalent | Parameters | Description |
|---|---|---|---|
VarIdent(group_var) |
varIdent(form=~1|group) |
G-1 | Different variance per group level |
VarPower(covariate) |
varPower(form=~cov) |
1 | sd = |v|^delta |
VarExp(covariate) |
varExp(form=~cov) |
1 | sd = exp(delta * v) |
VarConstPower(covariate) |
varConstPower(form=~cov) |
2 | sd = (c + |v|^delta) |
VarFixed(weights_var) |
varFixed(~cov) |
0 | Pre-specified weights (not estimated) |
VarComb(*varfuncs) |
varComb(...) |
sum | Product of multiple variance functions |
Usage
from python_gls.variance import VarIdent, VarPower, VarComb
# Different variance for treatment vs. control
var = VarIdent("treatment_group")
# Variance increases with fitted values
var = VarPower("fitted_values")
# Combine: group-specific + covariate-dependent
var = VarComb(VarIdent("group"), VarPower("x"))
API Reference
GLS Class
Construction
# From formula (recommended)
model = GLS.from_formula(
formula, # R-style formula: "y ~ x1 + x2"
data, # pandas DataFrame
correlation=None, # CorStruct instance
variance=None, # VarFunc instance
groups=None, # str: column name for groups
method="REML", # "ML" or "REML"
)
# From arrays
model = GLS(
endog=y, # response vector
exog=X, # design matrix (include intercept column)
correlation=None,
variance=None,
groups=None, # array of group labels
method="REML",
)
Fitting
result = model.fit(
maxiter=200, # max optimization iterations
tol=1e-8, # convergence tolerance
verbose=False, # print optimization progress
n_jobs=1, # parallel threads (-1 for all cores)
)
GLSResults Class
| Property / Method | Type | Description |
|---|---|---|
params |
Series | Estimated coefficients |
bse |
Series | Standard errors |
tvalues |
Series | t-statistics |
pvalues |
Series | Two-sided p-values |
conf_int(alpha=0.05) |
DataFrame | Confidence intervals |
sigma2 |
float | Estimated residual variance |
loglik |
float | Log-likelihood at convergence |
aic |
float | Akaike Information Criterion |
bic |
float | Bayesian Information Criterion |
resid |
array | Residuals (y - X*beta) |
fittedvalues |
array | Fitted values (X*beta) |
correlation_params |
array | Estimated correlation parameters |
variance_params |
array | Estimated variance parameters |
cov_params_func() |
DataFrame | Covariance matrix of beta |
summary() |
str | Formatted results table |
converged |
bool | Optimization convergence status |
n_iter |
int | Number of iterations |
method |
str | "ML" or "REML" |
How It Works
The Statistical Model
GLS models the response as:
y = X*beta + epsilon, where Var(epsilon) = sigma^2 * Omega
The covariance matrix Omega is block-diagonal by group:
Omega_g = A_g^{1/2} R_g A_g^{1/2}
where:
- R_g is the correlation matrix (from the correlation structure)
- A_g is a diagonal matrix of variance weights (from the variance function)
Estimation
- OLS initial fit to get starting residuals
- Initialize correlation and variance parameters from residuals
- Optimize profile log-likelihood over correlation/variance parameters using L-BFGS-B. At each step, beta and sigma^2 are profiled out analytically.
- Compute final GLS estimates at the converged parameters:
- beta = (X' Omega^{-1} X)^{-1} X' Omega^{-1} y
- Cov(beta) = sigma^2 (X' Omega^{-1} X)^{-1}
Key Design Decisions
Spherical parametrization for CorSymm: The unstructured correlation matrix is parametrized via angles that map to a Cholesky factor, guaranteeing positive-definiteness without constrained optimization. Based on Pinheiro & Bates (1996).
Analytic inverses: AR(1) and compound symmetry correlation matrices have O(m) analytic inverses and O(1) log-determinants, avoiding dense O(m^3) solves.
Batched computation: Balanced panels (equal group sizes) use batched NumPy operations — a single matrix multiply across all groups — instead of per-group loops.
Block-diagonal inversion: Omega is inverted per-group (O(n*m^3)) rather than as a full matrix (O(N^3)), where n = number of groups and m = group size.
Thread-level parallelism: Pass n_jobs=-1 to .fit() to distribute group-level computations across CPU cores via a thread pool. Useful for large unbalanced panels.
REML: Restricted maximum likelihood integrates out the fixed effects from the likelihood, giving unbiased variance estimates. This is the default, matching R's nlme::gls().
Formula Syntax
Powered by formulaic, supporting:
# Simple linear
"y ~ x1 + x2"
# Categorical variables
"y ~ C(treatment)"
# Interactions
"y ~ x1 * x2" # x1 + x2 + x1:x2
"y ~ x1 : x2" # just the interaction
# Transformations
"y ~ np.log(x1) + x2"
# Remove intercept
"y ~ x1 + x2 - 1"
ML vs. REML
| ML | REML | |
|---|---|---|
| Variance estimate | Biased (divides by N) | Unbiased (divides by N-k) |
| Default in R's gls | No | Yes |
| Default here | No | Yes |
| Use for model comparison | AIC/BIC of nested & non-nested models | Only models with same fixed effects |
method= |
"ML" |
"REML" |
Translating from R
R code → Python equivalent
# R
library(nlme)
m <- gls(y ~ x1 + x2,
data = df,
correlation = corAR1(form = ~1|subject),
weights = varIdent(form = ~1|group),
method = "REML")
summary(m)
intervals(m)
# Python
from python_gls import GLS
from python_gls.correlation import CorAR1
from python_gls.variance import VarIdent
r = GLS.from_formula(
"y ~ x1 + x2",
data=df,
correlation=CorAR1(),
variance=VarIdent("group"),
groups="subject",
method="REML",
).fit()
print(r.summary())
print(r.conf_int())
Parameter name mapping
| R | Python | Notes |
|---|---|---|
corAR1(form=~1|subject) |
CorAR1(), groups="subject" |
Groups specified at model level |
corCompSymm(form=~1|id) |
CorCompSymm(), groups="id" |
|
corSymm(form=~1|id) |
CorSymm(), groups="id" |
|
corExp(form=~x+y|id) |
CorExp(); cor.set_coordinates(...) |
Coordinates set per group |
varIdent(form=~1|group) |
VarIdent("group") |
Group variable as string |
varPower(form=~fitted) |
VarPower("fitted") |
Covariate name as string |
method="REML" |
method="REML" |
Same |
Dependencies
- numpy >= 1.24
- scipy >= 1.10
- pandas >= 2.0
- formulaic >= 1.0
Testing
pip install -e ".[dev]"
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_gls-0.2.0.tar.gz.
File metadata
- Download URL: python_gls-0.2.0.tar.gz
- Upload date:
- Size: 46.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd5acb27a72d78c43cc99bd0922adcb91bad5e359e1db5b7885b5bdfcd2ad200
|
|
| MD5 |
bcd4b9f47f99c5888aea5186e60453e7
|
|
| BLAKE2b-256 |
0516db0ccfaade88a003e36f09556874a2e8f974fe29e83197b941f5f4774a47
|
Provenance
The following attestation bundles were made for python_gls-0.2.0.tar.gz:
Publisher:
publish.yml on brunoabrahao/python-gls
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_gls-0.2.0.tar.gz -
Subject digest:
cd5acb27a72d78c43cc99bd0922adcb91bad5e359e1db5b7885b5bdfcd2ad200 - Sigstore transparency entry: 972856752
- Sigstore integration time:
-
Permalink:
brunoabrahao/python-gls@98387ddf5451449add8a9d27f9b79eea6f265a36 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/brunoabrahao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@98387ddf5451449add8a9d27f9b79eea6f265a36 -
Trigger Event:
release
-
Statement type:
File details
Details for the file python_gls-0.2.0-py3-none-any.whl.
File metadata
- Download URL: python_gls-0.2.0-py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb0235c692f3e91c4cb85232fb6fe6ea1f2681f6ff1c83968fdb461d1d8f17b6
|
|
| MD5 |
8fe968dc7232ee1808859f7af88bf125
|
|
| BLAKE2b-256 |
586a754f877192e2836e019d71e8ef341d1184e8b7fef8357e0276287c241009
|
Provenance
The following attestation bundles were made for python_gls-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on brunoabrahao/python-gls
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_gls-0.2.0-py3-none-any.whl -
Subject digest:
bb0235c692f3e91c4cb85232fb6fe6ea1f2681f6ff1c83968fdb461d1d8f17b6 - Sigstore transparency entry: 972856755
- Sigstore integration time:
-
Permalink:
brunoabrahao/python-gls@98387ddf5451449add8a9d27f9b79eea6f265a36 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/brunoabrahao
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@98387ddf5451449add8a9d27f9b79eea6f265a36 -
Trigger Event:
release
-
Statement type: