Firth-penalized models in Python
Project description
firthmodels
Firth-penalized models in Python:
FirthLogisticRegression: scikit-learn–compatible Firth logistic regressionFirthCoxPH: scikit-survival-style Firth Cox proportional hazards
Firth penalization reduces small-sample bias and produces finite estimates even when standard MLE fails due to (quasi-)complete separation or monotone likelihood.
See benchmarking results here comparing firthmodels, logistf, brglm2, and coxphf.
Why Firth penalization?
Standard maximum-likelihood logistic regression fails when your data has complete or quasi-complete separation: when a predictor (or combination of predictors) perfectly separates the outcome classes. In these cases, MLE produces infinite coefficient estimates.
In Cox proportional hazards, an analogous failure mode is monotone likelihood, where the partial likelihood becomes unbounded (often due to small samples, rare events, or near-perfect risk separation).
These problems are common in:
- Case-control studies with rare exposures
- Small clinical trials
- Genome-wide or Phenome-wide association studies (GWAS/PheWAS)
- Any dataset where events are rare relative to predictors
Firth's method adds a penalty term that:
- Produces finite, well-defined estimates even with separated data
- Reduces small-sample bias in coefficient estimates
Kosmidis and Firth (2021) formally proved that bias reduction for logistic regression models guarantees finite estimates as long as the model matrix has full rank.
Detecting separation
You can use detect_separation to check if your data has separation before fitting.
This implements the linear programming method from Konis (2007), as used in the
R detectseparation package by Kosmidis et al (2022).
The following example is based on the endometrial dataset used in Heinze and Schemper (2002),
where the NV feature causes quasi-complete separation.
from firthmodels import detect_separation
result = detect_separation(X, y)
result.separation # True
result.is_finite # array([False, True, True, True])
result.directions # array([1, 0, 0, 0]) # where 1 = +Inf, -1 = -Inf, 0 = finite
print(result.summary())
# Separation: True
# NV +Inf
# PI finite
# EH finite
# intercept finite
Installation
Pip
pip install firthmodels
Requires Python 3.10+ and depends on NumPy, SciPy, and scikit-learn.
Optional dependencies:
-
Numba acceleration:
pip install firthmodels[numba]- The first run with the Numba backend after installing or updating firthmodels may take 10-30 seconds due to JIT compilation. Subsequent runs are fast thanks to caching.
-
Formula interface for the statsmodels adapter:
pip install firthmodels[formula](or simply install formulaic).
Note:
Performance is significantly improved when NumPy/SciPy are built against a well-optimized BLAS/LAPACK library. You can check which library yours is using with np.show_config(). As a rule of thumb, MKL offers the best performance for Intel CPUs, while OpenBLAS is also a good choice for Intel and generally the best option for AMD. On macOS, ensure NumPy/SciPy are linked to Apple Accelerate.
The most straightforward way to control the BLAS/LAPACK library is to install firthmodels in a conda environment:
conda
conda install -c conda-forge firthmodels # usually defaults to OpenBLAS
conda install -c conda-forge firthmodels "libblas=*=*_newaccelerate" # Apple Accelerate
conda install -c conda-forge firthmodels "libblas=*=*mkl" # Intel MKL
conda install -c conda-forge firthmodels "libblas=*=*openblas" # OpenBLAS
Add numba to the conda install command to enable Numba acceleration.
Quick start
Firth logistic regression
import numpy as np
from firthmodels import FirthLogisticRegression
# Separated data: x=1 perfectly predicts y=1
X = np.array([[0], [0], [0], [1], [1], [1]])
y = np.array([0, 0, 0, 1, 1, 1])
# Standard logistic regression would fail here
model = FirthLogisticRegression().fit(X, y)
print(model.coef_) # array([3.89181893])
print(model.intercept_) # -2.725...
print(model.pvalues_) # Wald p-values
print(model.bse_) # Standard errors
Firth Cox proportional hazards
import numpy as np
from firthmodels import FirthCoxPH
X = np.array([[1.0], [0.0]])
event = np.array([True, False])
time = np.array([1.0, 2.0])
model = FirthCoxPH().fit(X, (event, time))
print(model.coef_) # log hazard ratios
print(model.pvalues_) # Wald p-values
# Survival curves evaluated at the training event times
S = model.predict_survival_function(X) # shape: (n_samples, n_event_times)
FirthCoxPH also accepts y as a structured array with boolean event and float time
fields (scikit-survival style).
Both estimators take a backend parameter that can be one of 'auto' (default), 'numba', or 'numpy'. If 'auto', firthmodels auto-detects Numba availability and uses
it if installed, otherwise numpy/scipy.
Estimators
scikit-learn compatible API
FirthLogisticRegression follows the scikit-learn estimator API
(fit, predict, predict_proba, get_params, set_params, etc.), and can be used
with pipelines, cross-validation, and other sklearn tools:
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
pipe = make_pipeline(StandardScaler(), FirthLogisticRegression())
scores = cross_val_score(pipe, X, y, cv=5)
FirthCoxPH also follows the sklearn estimator API (fit, predict, score, etc.).
It also has a scikit-survival-like interface:
- methods:
fit(X, y),predict(X)(linear predictor),score(X, y)(C-index),predict_survival_function(X, return_array=True),predict_cumulative_hazard_function(X, return_array=True). - attributes:
unique_times_,cum_baseline_hazard_,baseline_survival_(Breslow-style baseline).
Inference
Likelihood ratio tests (LRT)
Both estimators support penalized likelihood ratio tests for individual coefficients. These are often more reliable than Wald p-values in small samples.
lrt() populates lrt_pvalues_ and lrt_bse_ (a back-corrected standard error such
that (beta / lrt_bse_)**2 matches the 1-df chi-squared statistic), which can be
useful for meta-analysis weighting:
model.fit(X, y).lrt() # Compute LRT for all coefficients
model.lrt_pvalues_ # LRT p-values
model.lrt_bse_ # Back-corrected standard errors (separate from Wald bse_)
Each feature requires a separate constrained model fit, so you can test selectively to
avoid unnecessary computation. By default, LRT uses a warm start based on the full-model
covariance to reduce Newton-Raphson iterations; pass warm_start=False to disable it.
model.lrt(0) # Single feature by index
model.lrt([0, 2]) # Multiple features
model.lrt(['snp', 'age']) # By name (if fitted with DataFrame)
model.lrt(['snp', 2]) # Mixed
model.lrt(warm_start=False) # Disable warm start
Confidence intervals
model.conf_int() # 95% Wald CIs
model.conf_int(alpha=0.1) # 90% CIs
model.conf_int(method='pl') # Profile likelihood CIs (more accurate)
model.conf_int(method='pl', features=['snp', 'age']) # can selectively compute as with LRT
Sample weights and offsets
# currently for FirthLogisticRegression only
model.fit(X, y, sample_weight=weights)
model.fit(X, y, offset=offset)
Statsmodels adapter (FirthLogit)
The statsmodels adapter wraps FirthLogisticRegression behind a statsmodels-like API
and returns a results object with common statsmodels attributes/methods
(params, bse, pvalues, summary(), cov_params(), etc.).
Notes:
- Unlike sklearn,
FirthLogitdoes not add an intercept automatically; usesm.add_constant(X). fit(pl=True)(default) computes likelihood ratio p-values and uses profile likelihood confidence intervals by default, matching R logistf convention. Standard errors (bse) remain Wald standard errors.
import numpy as np
import statsmodels.api as sm
from firthmodels.adapters.statsmodels import FirthLogit
X = np.array([[1], [2], [3], [4], [5], [6]])
y = np.array([0, 0, 0, 1, 1, 1])
X = sm.add_constant(X)
res = FirthLogit(y, X).fit()
print(res.params)
print(res.pvalues) # LRT p-values when pl=True
print(res.conf_int()) # profile likelihood CIs when pl=True
print(res.summary())
Formula interface
If you install firthmodels[formula] (or pip install formulaic), you can fit from
a formula and a pandas DataFrame:
import pandas as pd
from firthmodels.adapters.statsmodels import FirthLogit
df = pd.DataFrame({"y": [0, 1, 0, 1], "age": [20, 30, 40, 50], "treatment": [0, 1, 0, 1]})
res = FirthLogit.from_formula("y ~ age + treatment", df).fit()
API notes
FirthLogisticRegression parameters
| Parameter | Default | Description |
|---|---|---|
backend |
'auto' |
'auto', 'numba', or 'numpy'. 'auto' uses numba if available. |
fit_intercept |
True |
Whether to add an intercept term |
max_iter |
25 |
Maximum Newton-Raphson iterations |
gtol |
1e-4 |
Gradient convergence tolerance (converged when max|gradient| < gtol) |
xtol |
1e-4 |
Parameter convergence tolerance (converged when max|delta| < xtol) |
max_step |
5.0 |
Maximum step size per coefficient |
max_halfstep |
25 |
Maximum step-halvings per iteration |
penalty_weight |
0.5 |
Weight of the Firth penalty term. The default 0.5 corresponds to the standard Firth bias reduction method (Firth, 1993), equivalent to using Jeffreys' invariant prior. Set to 0.0 for unpenalized maximum likelihood estimation. |
FirthLogisticRegression attributes (after fitting)
| Attribute | Description |
|---|---|
coef_ |
Coefficient estimates |
intercept_ |
Intercept (0.0 if fit_intercept=False) |
bse_ |
Wald standard errors; includes intercept if fit_intercept=True |
pvalues_ |
Wald p-values; includes intercept if fit_intercept=True |
loglik_ |
Penalized log-likelihood |
n_iter_ |
Number of iterations |
converged_ |
Whether the solver converged |
lrt_pvalues_ |
LRT p-values (after calling lrt()); includes intercept if fit_intercept=True |
lrt_bse_ |
Back-corrected SEs (after calling lrt()); includes intercept if fit_intercept=True |
classes_ |
Class labels (shape (2,)) |
n_features_in_ |
Number of features seen during fit |
feature_names_in_ |
Feature names (if X had string column names) |
FirthCoxPH parameters
| Parameter | Default | Description |
|---|---|---|
backend |
'auto' |
'auto', 'numba', or 'numpy'. 'auto' uses numba if available. |
max_iter |
50 |
Maximum Newton-Raphson iterations |
gtol |
1e-4 |
Gradient convergence tolerance (converged when max|gradient| < gtol) |
xtol |
1e-6 |
Parameter convergence tolerance (converged when max|delta| < xtol) |
max_step |
5.0 |
Maximum step size per coefficient |
max_halfstep |
5 |
Maximum step-halvings per iteration |
penalty_weight |
0.5 |
Weight of the Firth penalty term. The default 0.5 corresponds to the standard Firth bias reduction method (Heinze and Schemper, 2001), equivalent to using Jeffreys' invariant prior. Set to 0.0 for unpenalized Cox partial likelihood estimation. |
FirthCoxPH attributes (after fitting)
| Attribute | Description |
|---|---|
coef_ |
Coefficient estimates (log hazard ratios) |
bse_ |
Wald standard errors |
pvalues_ |
Wald p-values |
loglik_ |
Penalized log partial likelihood |
n_iter_ |
Number of iterations |
converged_ |
Whether the solver converged |
lrt_pvalues_ |
LRT p-values (after calling lrt()) |
lrt_bse_ |
Back-corrected SEs (after calling lrt()) |
unique_times_ |
Unique event times (ascending order) |
cum_baseline_hazard_ |
Breslow cumulative baseline hazard at unique_times_ |
baseline_survival_ |
Baseline survival function at unique_times_ |
n_features_in_ |
Number of features seen during fit |
feature_names_in_ |
Feature names (if X had string column names) |
predict(X) returns the linear predictor X @ coef_ (log partial hazard).
predict_cumulative_hazard_function(X) and predict_survival_function(X) return arrays
evaluated at unique_times_.
References
Firth D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27-38.
Heinze G, Schemper M (2001). A solution to the problem of monotone likelihood in Cox regression. Biometrics 57, 114-119.
Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21, 2409-2419.
Konis, K. (2007). Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models. DPhil thesis, University of Oxford.
Kosmidis I, Firth D (2021). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. Biometrika 108, 71-82.
Kosmidis I, Schumacher D, Schwendinger F (2022). detectseparation: Detect and Check for Separation and Infinite Maximum Likelihood Estimates. doi:10.32614/CRAN.package.detectseparation https://doi.org/10.32614/CRAN.package.detectseparation, R package version 0.3, https://CRAN.R-project.org/package=detectseparation.
Mbatchou J et al. (2021). Computationally efficient whole-genome regression for quantitative and binary traits. Nature Genetics 53, 1097-1103.
Venzon DJ, Moolgavkar SH (1988). A method for computing profile-likelihood-based confidence intervals. Applied Statistics 37, 87-94.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file firthmodels-0.7.2.tar.gz.
File metadata
- Download URL: firthmodels-0.7.2.tar.gz
- Upload date:
- Size: 636.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28e9b06b2698a6ca14f217ab56d7af2e405744c657eeaa55db42e824ba5ae7fa
|
|
| MD5 |
bcc45a587166ae044b97d19586a2585f
|
|
| BLAKE2b-256 |
65ec406a5939ac57f17d5b278f34a495c4a318383b1352dee2d85230188bba53
|
Provenance
The following attestation bundles were made for firthmodels-0.7.2.tar.gz:
Publisher:
publish.yml on jzluo/firthmodels
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
firthmodels-0.7.2.tar.gz -
Subject digest:
28e9b06b2698a6ca14f217ab56d7af2e405744c657eeaa55db42e824ba5ae7fa - Sigstore transparency entry: 975027835
- Sigstore integration time:
-
Permalink:
jzluo/firthmodels@25d467451c43161051fecb149664eb7d107929dc -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/jzluo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25d467451c43161051fecb149664eb7d107929dc -
Trigger Event:
push
-
Statement type:
File details
Details for the file firthmodels-0.7.2-py3-none-any.whl.
File metadata
- Download URL: firthmodels-0.7.2-py3-none-any.whl
- Upload date:
- Size: 59.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
398e11c0b5522f348dea7feec399bfd18d2a16127cd0b916123b40721b584bf0
|
|
| MD5 |
0e52c1049aac21bc0f2e4b69aaee3bab
|
|
| BLAKE2b-256 |
7518a2a2dce2bdd1c546bc93bc67a8b6657e8560cc99c40f3d5d714a331a4627
|
Provenance
The following attestation bundles were made for firthmodels-0.7.2-py3-none-any.whl:
Publisher:
publish.yml on jzluo/firthmodels
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
firthmodels-0.7.2-py3-none-any.whl -
Subject digest:
398e11c0b5522f348dea7feec399bfd18d2a16127cd0b916123b40721b584bf0 - Sigstore transparency entry: 975027868
- Sigstore integration time:
-
Permalink:
jzluo/firthmodels@25d467451c43161051fecb149664eb7d107929dc -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/jzluo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25d467451c43161051fecb149664eb7d107929dc -
Trigger Event:
push
-
Statement type: