Skip to main content

Python port of the R gamlss package: Generalised Additive Models for Location Scale and Shape

Project description

gamlss-python

PyPI Python License: GPL-3.0

R's gamlss in pure Python, numerically verified against the original.

A Python port of the R packages gamlss (5.5-0) and gamlss.dist (6.1-1): Generalised Additive Models for Location, Scale and Shape (Rigby & Stasinopoulos, 2005).

The port reproduces the original R implementation numerically: the RS/CG fitting algorithms, link functions, distribution derivatives, deviance computations, standard errors and predictions are transcribed from the R sources and verified against reference results generated by the original R package (see tests/). Coefficients typically agree with R to ~1e-9 relative, deviances to 1e-9, iteration counts exactly.

Pure Python

The package is implemented entirely in Python — no C extensions, no Cython, no build step, and no dependency on R at runtime. The only requirements are numpy, scipy, pandas and patsy (whose own compiled internals do the numerical heavy lifting, as with any scientific Python package). Routines that the R version implements in C (e.g. the PIG/SICHEL-family recursions in gamlss.dist/src/*.c) were reimplemented in Python/numpy rather than wrapped.

Status

This is a young port — treat it as beta. The parametric fitting core is verified against R by the automated suite described below, but beyond that only some of the core functions have been manually tested; less-travelled code paths may still contain porting bugs. If a result matters, cross-check it against the original R package, and please report discrepancies via the issue tracker.

Install

pip install gamlss        # from PyPI

pip install -e .          # from the repo root
pip install -e .[test]    # with test dependencies

Dependencies: numpy, scipy, pandas, patsy.

Quick start

import gamlss as gl
from gamlss import gamlss, fitted, coef, deviance, GAIC

ab = gl.load_data("abdom")

# R: gamlss(y ~ x, sigma.formula = ~x, family = BCT, data = abdom)
m = gamlss("y ~ x", sigma_formula="~x", family=gl.BCT(), data=ab)

m.summary()                  # R: summary(m)
coef(m, "mu")                # R: coef(m, "mu")
fitted(m, "sigma")           # R: fitted(m, "sigma")
deviance(m)                  # R: deviance(m)
GAIC(m, k=2)                 # R: GAIC(m, k=2)
m.predict(what="mu", newdata=ab.head(5), type="response")

API mapping (R -> Python)

R uses . inside argument and component names; Python uses _.

R Python
gamlss(y~x, sigma.formula=~z, family=GA, data=d) gamlss("y ~ x", sigma_formula="~z", family=GA(), data=d)
gamlss.control(n.cyc=50) gamlss_control(n_cyc=50) or gamlss(..., n_cyc=50)
glim.control(cc=1e-4) glim_control(cc=1e-4) or gamlss(..., cc=1e-4)
method=RS() / CG() / mixed(2, 20) method=RS() / CG() / mixed(2, 20)
mu.start=, sigma.fix=TRUE mu_start=, sigma_fix=True
weights=w weights=w (array or column name)
m$mu.fv, m$mu.lp, m$mu.coefficients m.mu_fv, m.mu_lp, m.mu_coefficients
m$G.deviance, m$aic, m$sbc, m$df.fit m.G_deviance, m.aic, m.sbc, m.df_fit
fitted(m, "mu") fitted(m, "mu") or m.fitted("mu")
coef(m, what="sigma") coef(m, what="sigma")
coefAll(m) coefAll(m)
deviance(m, "G") deviance(m, "G")
logLik(m) logLik(m)
AIC(m1, m2, k=2) / GAIC(m, k=log(n)) / IC AIC(m1, m2, k=2) / GAIC(m, k=...) / IC
summary(m, type="vcov"/"qr") m.summary(type="vcov"/"qr") or summary(m, ...)
vcov(m, type="se") vcov(m, type="se") or m.vcov("se")
predict(m, what="mu", newdata=nd, type="response") m.predict(what="mu", newdata=nd, type="response")
predictAll(m, newdata=nd) m.predictAll(newdata=nd)
lpred(m, what="mu", type="terms", se.fit=TRUE) lpred(m, what="mu", type="terms", se_fit=True)
residuals(m) (normalised quantile residuals) residuals(m) / m.get_residuals()
residuals(m, "mu", type="weighted") residuals(m, "mu", type="weighted")
Rsq(m) Rsq(m)
dNO/pNO/qNO/rNO(..., lower.tail=, log.p=) dNO/pNO/qNO/rNO(..., lower_tail=, log_p=)

Formulas

Formulas are strings interpreted by patsy with R-compatible behaviour:

  • "y ~ x + C(g)" — factors via C() (R: y ~ x + g with a factor g); design-matrix term order follows R's model.matrix.
  • "y ~ poly(x, 3)" — exact port of R's orthogonal poly() (including prediction via the stored recurrence coefficients).
  • "cbind(y, n - y) ~ x" — binomial responses with denominators.
  • "y ~ x + offset(log(t))" — in-formula offsets.
  • ^ is translated to R semantics ((a+b)^2 crossing, power inside I()).

Families implemented

Continuous: NO, NO2, EXP, GA, IG, IGAMMA, GU, RG, LO, LOGNO, WEI, WEI3, TF, PE, GG, SN1, JSU, JSUo, SHASHo, BCCG, BCCGo, BCT, BCTo, BCPE, BCPEo, BE, BEo Discrete: PO, NBI, NBII, GEOM, PIG, LG, ZIP, ZIP2, ZINBI, ZANBI Binomial-type: BI, BB, ZABI, ZIBI

Each family ships its d/p/q/r functions (dGA, pGA, ...) matching the R parameterisations exactly.

Verification against R

r-scripts/gen_reference.R fits 43 reference models with the original R gamlss (plus dense d/p/q grids for every family) and exports the results to JSON. The pytest suite re-fits every model in Python and compares:

  • coefficients (rtol 1e-6, typically agree to 1e-9),
  • global deviance / AIC / SBC (rtol 1e-9),
  • effective df, residual df, noObs and the exact RS/CG iteration count,
  • fitted values for every distribution parameter,
  • quantile residuals (continuous families; discrete ones are randomised by construction),
  • qr-type standard errors (chol2inv of the working-WLS R matrix),
  • vcov-type standard errors (numerical Hessian, optimHess with R's ndeps=1e-3, falling back to HessianPB exactly as R does),
  • predictions for new data (including R's "safe prediction" refit semantics, factor levels and poly() terms).

Run them:

# regenerate the reference (needs R with gamlss + jsonlite):
Rscript r-scripts/gen_reference.R
# verify the Python port:
python -m pytest tests/ -q

Known, documented differences

  • R quirks are reproduced where they affect results; e.g. the ifelse(x<0, 0, fy) shape-truncation in dBI that makes ZIBI/ZABI use the first observation's bd/mu in the zero-mass correction is replicated (gamlss/dist/BI.py:_dBI0).
  • qIG/pSN1-style quantiles use root finding: R's uniroot (tol ~1.2e-4) vs scipy's brentq (1e-12) — Python values are more precise; agreement is at R's tolerance.
  • vcov() on near-singular Hessians (e.g. BCTo on abdom where tau -> inf): R produces NaN standard errors and its summary() falls back to type="qr"; knife-edge sign differences in the fallback path can occur. The deviance/coefficients still agree.
  • Randomised quantile residuals for discrete families use NumPy's RNG; they match R distributionally, not bitwise (pass rng= for reproducibility).
  • Smoothing/additive terms (pb(), cs(), ...) are not ported yet — the parametric GAMLSS core (formulas, all four parameters, RS/CG, weights, offsets, factors) is complete.

Releasing to PyPI

Releases are published by GitHub Actions (.github/workflows/publish.yml) via PyPI trusted publishing — no API tokens needed. One-time setup: on pypi.org, add a "pending publisher" for the project gamlss pointing at this repo (fzhao70/gamlss-python, workflow publish.yml, environment pypi), and create a GitHub environment named pypi in the repo settings. Then to release:

  1. Bump version in pyproject.toml.
  2. Tag and create a GitHub release (git tag v0.1.0 && git push --tags, then "Releases → Draft a new release").
  3. The workflow builds the sdist/wheel and uploads to PyPI.

License and attribution

This package is a Python translation of the R packages gamlss and gamlss.dist by Mikis Stasinopoulos, Robert Rigby and co-authors (licensed GPL-2 | GPL-3), and bundles datasets from gamlss.data. As a derivative work it is distributed under the GNU GPL v3 (see LICENSE). It is not affiliated with or endorsed by the original authors.

If you use GAMLSS in published work, please cite:

Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape (with discussion). Applied Statistics, 54, 507-554.

Layout

gamlss/            the package
  engine.py        gamlss(), RS/CG/mixed, glim.fit, lm.wfit, controls
  family.py        gamlss.family infrastructure
  links.py         make.link.gamlss link functions
  formula.py       R formulas on patsy (poly, cbind, offset, naming)
  model.py         the fitted-model object + all S3-method equivalents
  methods.py       functional API (fitted, coef, GAIC, ...)
  rqres.py         normalised (randomised) quantile residuals
  dist/            one module per gamlss.dist family
  data/            datasets from gamlss.data as CSV
r-scripts/         R reference generation + fetch_r_sources.sh
tests/             verification suite (R reference JSONs included)

The original R sources used for the transcription are not committed; run r-scripts/fetch_r_sources.sh to download the exact CRAN versions into r-src/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gamlss-0.1.0.tar.gz (90.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gamlss-0.1.0-py3-none-any.whl (119.6 kB view details)

Uploaded Python 3

File details

Details for the file gamlss-0.1.0.tar.gz.

File metadata

  • Download URL: gamlss-0.1.0.tar.gz
  • Upload date:
  • Size: 90.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gamlss-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3d42d021940ab042be78a2ff5e078b83794f1c5c6e899598b6bc4e766fd70999
MD5 28a9e4c80dd1c303f435f88694202409
BLAKE2b-256 85c396b6f353314c5e66f3c464b8badd54f41ea55c3b3d6cf373a54686842603

See more details on using hashes here.

Provenance

The following attestation bundles were made for gamlss-0.1.0.tar.gz:

Publisher: publish.yml on fzhao70/gamlss-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gamlss-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gamlss-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 119.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gamlss-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a43a84090fb9847d87fdb814a0634f3d9ee010804b9323d37f9cfc157eca3ee4
MD5 d2b457374abf09d4836de9dd2c997766
BLAKE2b-256 aeba712fd1cba69425312eebc2e28ec5c34406fa300dbf01b948e454337b381d

See more details on using hashes here.

Provenance

The following attestation bundles were made for gamlss-0.1.0-py3-none-any.whl:

Publisher: publish.yml on fzhao70/gamlss-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page