Skip to main content

{varguid}: Variance-Guided Regression Improving Upon OLS and ANOVA for Python

Project description

{varguid}: Variance-Guided Regression Improving Upon OLS and ANOVA for Python

varguid is a Python implementation of the stage-1 variance-guided regression method described in:

Liu S. and Lu M. (2026). Variance-Guided Regression for Heteroscedastic Data with a Grouping-Based Extension for Nonlinear Prediction. Statistics in Medicine 45(13-14):e70632. DOI: 10.1002/sim.70632.

The package implements the global linear mean-variance model from Section 2 of the paper and mirrors the stage-1 scope of the attached R package varGuid 0.1.5. The grouping-based nonlinear extension from Section 3 is not included.

Features

  • Iteratively reweighted least squares for non-sparse fits.
  • Iteratively reweighted lasso using scikit-learn.
  • Baseline and variance-guided prediction from the same fitted result.
  • Weighted least-squares and HC0-HC3 coefficient summaries for non-lasso fits.
  • Matrix, pandas DataFrame, Patsy formula, and scikit-learn-style interfaces.
  • Packaged cobra2d data and a reproducible synthetic-data generator.

Installation

varguid 0.1.8 requires Python 3.12 or newer.

python -m pip install varguid

For local development:

python -m pip install -e ".[dev]"

Quick start

from varguid import lmv, load_cobra2d

# Packaged data from the attached R release
data = load_cobra2d()
train = data.iloc[:-200].copy()
y = train["y"]
X = train.drop(columns="y")

fit = lmv(X, y, M=10, lasso=False)

baseline_pred = fit.predict(X, model="baseline")
varguid_pred = fit.predict(X, model="varGuid")

print(fit.beta[:5])
print(baseline_pred[:3])
print(varguid_pred[:3])
print(fit.summary_frame(cov_type="HC3").head())

The top-level helper has the same behavior:

from varguid import predict

varguid_pred = predict(fit, X, model="varGuid")

The R-compatible prd() name remains available as a deprecated alias.

Sparse fit

from varguid import lmv, load_cobra2d

data = load_cobra2d().iloc[:120]
X = data.drop(columns="y")
y = data["y"]

fit = lmv(
    X,
    y,
    M=3,
    lasso=True,
    cv_folds=5,
    rng=42,
)
pred = fit.predict(X, model="varGuid")
print(fit.beta)
print(pred[:3])

The lasso implementation standardizes predictors and uses shuffled, reproducible cross-validation folds. It follows the R package's high-level procedure but uses scikit-learn rather than glmnet, so sparse coefficients are not expected to be bit-for-bit identical across languages.

The functional API follows Scientific Python SPEC 7: pass an integer or a numpy.random.Generator through rng. The older random_state keyword is accepted with a deprecation warning. VarGuidRegressor retains random_state, as expected by scikit-learn estimators.

Formula interface

from varguid import lmv_formula, load_cobra2d

data = load_cobra2d()
fit = lmv_formula("y ~ x1 + x2 + x3 + x4 + x5", data=data, M=5)
pred = fit.predict(data.iloc[:5])
print(fit.summary())

No-intercept formulas are honored:

fit_no_intercept = lmv_formula("y ~ 0 + x1 + x2 + x3", data=data, M=5)

Formula-based prediction expects a DataFrame containing the original formula variables. Patsy reconstructs transformations and categorical encodings from the fitted design information.

Scikit-learn estimator

from varguid import VarGuidRegressor, load_cobra2d

data = load_cobra2d()
X = data.drop(columns="y")
y = data["y"]

model = VarGuidRegressor(max_iter=5, use_lasso=False)
model.fit(X, y)
pred = model.predict(X.iloc[:5])
print(model.summary_frame(cov_type="HC1").head())

The estimator records n_features_in_, raises NotFittedError before fitting, and passes scikit-learn's estimator checks. For DataFrame fits, prediction columns are reordered to the fitted order; missing or unexpected columns are rejected rather than silently producing incorrect predictions.

Reproducibility and R compatibility

The non-lasso update order follows the attached R R/irls.R implementation, including the model update that occurs before the convergence branch. Fitted variance values are not clipped. A numerically exact zero-variance fit receives a safe uniform-weight fallback instead of producing non-finite values.

The automated tests include an independent translation of the attached R non-lasso algorithm, direct statsmodels comparisons, lasso reproducibility and scale-invariance checks, formula tests, DataFrame schema tests, README example execution, and the full scikit-learn estimator contract.

Development checks

python -m pytest
ruff check src tests
ruff format --check src tests
mypy src tests
check-manifest
python -m build
python -m twine check dist/*

Equivalent nox sessions are available:

nox -s tests lint build

See FIX_LOG_0.1.8.md and VALIDATION_LOG_0.1.8.txt for the release audit.

Publishing

The release workflow builds distributions in a separate job and publishes from a protected pypi GitHub environment through PyPI Trusted Publishing. See RELEASING.md for the one-time repository configuration and release steps.

Citation

If you use varguid, please cite both the Python software and the method paper. The root-level CITATION.cff file is the authoritative machine-readable software citation. The version-specific DOI for varguid 0.1.8 is https://doi.org/10.5281/zenodo.20816141.

Python software

@software{wang_lu_2026_varguid_python,
  author    = {Wang, Zihao and Lu, Min},
  title     = {{varguid}: Variance-Guided Regression Improving Upon OLS and ANOVA for Python},
  version   = {0.1.8},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.20816141},
  url       = {https://doi.org/10.5281/zenodo.20816141}
}

Method paper

@article{liu_lu_2026_varguid,
  author  = {Liu, Sibei and Lu, Min},
  title   = {Variance-Guided Regression for Heteroscedastic Data With a Grouping-Based Extension for Nonlinear Prediction},
  journal = {Statistics in Medicine},
  volume  = {45},
  number  = {13-14},
  pages   = {e70632},
  year    = {2026},
  doi     = {10.1002/sim.70632}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varguid-0.1.8.tar.gz (115.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varguid-0.1.8-py3-none-any.whl (95.2 kB view details)

Uploaded Python 3

File details

Details for the file varguid-0.1.8.tar.gz.

File metadata

  • Download URL: varguid-0.1.8.tar.gz
  • Upload date:
  • Size: 115.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for varguid-0.1.8.tar.gz
Algorithm Hash digest
SHA256 bb89ed23deb007f72d80822beb30bc26a50c77230883b023892a8b05cd3ecc97
MD5 9a84ead685b96c926716882a22b73159
BLAKE2b-256 d1d583c187fb139129999d200f23c3415814fc2073ddc5179c7fbaa6fb47e6a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for varguid-0.1.8.tar.gz:

Publisher: python-publish.yml on zionwzz/varguid-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file varguid-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: varguid-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 95.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for varguid-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f416b92f8499000c9eb706404184d2cc1888b85d076a43657f7f0c5f5f30dba6
MD5 b9bbc4e2c648f7072a4d99262aa53311
BLAKE2b-256 50be57c6cbc0c573674073cc8ed914066a0c7010cfc98750af42bf1125ec8c13

See more details on using hashes here.

Provenance

The following attestation bundles were made for varguid-0.1.8-py3-none-any.whl:

Publisher: python-publish.yml on zionwzz/varguid-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page