Python port of R's cmprsk: Estimation, testing and regression modeling of subdistribution functions in competing risks.
Project description
pycmprsk
A Python port of the R package cmprsk. Estimation, testing and regression modeling of subdistribution functions in competing risks:
cuminc- non-parametric cumulative incidence functions, with Gray's k-sample test across groups and stratification.crr- Fine-Gray subdistribution-hazards regression, with time-fixed covariates (cov1), time-varying covariates (cov2 + tf), per-group censoring weights (cengroup), and the full Huber/White sandwich variance including theq(u)correction for the estimated censoring distribution.predict_crr,summary_crr,timepoints,plot_cuminc,plot_predict- the same downstream API as R.
This package's functionality is numerically validated against R's cmprsk. See
Parity testing below.
Install
pip install pycmprsk
The package is pure Python; the hot loops are JIT-compiled with numba.
Quick example
The arrays below were dumped from R using RNGversion("1.6.2"); set.seed(2)
and the same data setup as cmprsk/tests/test.R. The resulting plots
visually match R's plot(cuminc(...)) and plot(predict(crr(...))).
import matplotlib.pyplot as plt
import numpy as np
from pycmprsk import crr, cuminc, plot_cuminc, plot_predict, predict_crr
def tf_quad(uft):
"""Match R's ``function(uft) cbind(uft, uft^2)``."""
uft = np.asarray(uft, dtype=np.float64)
return np.column_stack([uft, uft**2])
Data arrays (Click to expand - Data sourced from R's test suite)
ftime = np.array([
0.686305, 0.149818, 1.611875, 1.077275, 1.553027, 0.286783, 0.234919, 0.255626, 0.536215, 1.420936,
1.979941, 0.816767, 0.970783, 3.376077, 1.407218, 0.229477, 2.821243, 1.598966, 0.661166, 0.291716,
2.421805, 0.264711, 0.419970, 0.994872, 5.248650, 0.493777, 0.036222, 0.039556, 2.225511, 1.896816,
1.562481, 2.080967, 0.062462, 0.308574, 0.854363, 1.086975, 0.183905, 0.877297, 0.166353, 1.346992,
3.303843, 0.723761, 0.043173, 1.635107, 1.022373, 1.565542, 0.734400, 1.705071, 1.527256, 1.921497,
1.854679, 0.310276, 2.424571, 0.515172, 1.251790, 1.054940, 0.010267, 1.079949, 0.136024, 0.466943,
1.348637, 0.113960, 2.535242, 0.762922, 0.432438, 0.666299, 0.862624, 0.479771, 0.397440, 1.493170,
0.661091, 0.540539, 1.355944, 0.773167, 3.902563, 0.117417, 1.786273, 0.072698, 0.259388, 2.092709,
0.229584, 0.490496, 0.425987, 0.335195, 0.697602, 0.097860, 0.917998, 0.174528, 0.680717, 1.835194,
2.997399, 1.937913, 0.520418, 1.653625, 2.238665, 0.149357, 0.720766, 0.096726, 0.831950, 1.003850,
])
fstatus = np.array([
1, 2, 0, 2, 2, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 1, 0,
2, 1, 1, 1, 1, 2, 0, 0, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0, 1,
2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 2, 1, 0, 2, 0, 2, 1, 2,
0, 2, 2, 1, 2, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0,
0, 0, 1, 1, 2, 1, 2, 2, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 1, 1,
])
group_code = np.array([
3, 1, 3, 1, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 1,
1, 3, 2, 3, 1, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 3, 2, 1, 3,
3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2,
1, 3, 2, 1, 2, 3, 1, 1, 3, 3, 3, 1, 2, 3, 1, 1, 2, 2, 3, 3,
2, 3, 3, 2, 2, 2, 2, 3, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,
])
group = np.array(["a", "b", "c"])[group_code - 1]
strata = np.array([
1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1,
2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,
1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
])
cov1 = np.array([
np.nan, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
1.0, 1.0, 1.0, np.nan, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0,
0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, np.nan, 1.0, 1.0,
1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0,
1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0,
0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0,
1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0,
0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0,
0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0,
]).reshape(100, 3)
cov2 = np.column_stack([cov1[:, 0], cov1[:, 0]])
cengroup = cov1[:, 2]
ci = cuminc(ftime, fstatus, group=group, strata=strata)
print("cuminc curve keys:", list(ci.curves.keys()))
print("Gray's k-sample tests (stat, pv, df):\n", ci.tests)
fit = crr(ftime, fstatus, cov1=cov1, cov2=cov2, tf=tf_quad, cengroup=cengroup)
print("crr coefs:", fit.coef)
print("crr converged:", fit.converged)
pred = predict_crr(
fit,
cov1=np.array([[1.0, 1.0, 1.0], [0.0, 0.0, 0.0]]),
cov2=np.array([[1.0, 1.0], [0.0, 0.0]]),
)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_cuminc(ci, ax=ax1)
ax1.set_title("cuminc(ss, cc, gg, strt)")
plot_predict(pred, ax=ax2)
ax2.set_title("predict(crr(ss, cc, cv, cov2, tf, cengroup=cv[,3]))")
plt.tight_layout()
plt.show()
What's different from R's cmprsk
pycmprsk is a port of the functionality, not a verbatim translation of the API:
- Returns dataclasses (
CRRResult,CumincResult,SummaryCRR) rather than R's named lists. Field names use Python snake_case (n_missingvs. R'sn.missing,loglik_nullvs.loglik.null). tf(the time-varying covariate function) takes and returns NumPy arrays; Python's contract is that it returns shape(ndf, p2)(R'scmprskwraps 1D output viaas.matrix; do the equivalent with.reshape(-1, 1)).na.actionis fixed to "omit rows with any NA," matching R's default.
Behavioral parity is the explicit design goal - see src/tests/test_parity.py.
Parity testing
The test suite is 1:1 with R's cmprsk/tests/test.R: every scenario in
that file has a corresponding .npz fixture (data + R reference outputs)
under src/tests/fixtures/, and one matching Python test per scenario.
To regenerate the fixtures (requires R with the cmprsk and reticulate
packages installed):
Rscript src/tests/r_fixtures.R
To run the parity tests:
pytest src/tests
License
pycmprsk is distributed under the GNU General Public License v3.0 or
later (GPL-3.0-or-later).
This package is a derivative work of R's cmprsk (Bob Gray), which is
licensed under GPL (>= 2). The Fortran sources from cmprsk/src/*.f have
been re-implemented in Python while preserving the original
algorithms. As a derivative work, pycmprsk must remain GPL-compatible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycmprsk-1.0.0.tar.gz.
File metadata
- Download URL: pycmprsk-1.0.0.tar.gz
- Upload date:
- Size: 65.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42907d120bb38ae50ab60fbe05f7e0546ffc6622a1f39f265be10552ad53f229
|
|
| MD5 |
8a5a86cb6dc269a15484662bbd4884e3
|
|
| BLAKE2b-256 |
c0f69bb93d9c41019071d666e89e93e9eb3ed73e35fa4b035c2a19caf222ca4a
|
Provenance
The following attestation bundles were made for pycmprsk-1.0.0.tar.gz:
Publisher:
release.yml on covertcast/pycmprsk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycmprsk-1.0.0.tar.gz -
Subject digest:
42907d120bb38ae50ab60fbe05f7e0546ffc6622a1f39f265be10552ad53f229 - Sigstore transparency entry: 1614610305
- Sigstore integration time:
-
Permalink:
covertcast/pycmprsk@c20eb6e25ea72f9a42216a44c03152965aa3cfbd -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/covertcast
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c20eb6e25ea72f9a42216a44c03152965aa3cfbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycmprsk-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pycmprsk-1.0.0-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0ef906f8047cb5a8b6bb009bf7982dfc66702058a1c721181cf3b1dc7b68563
|
|
| MD5 |
8d6fd07e3c0ae9fc2e927ee67df603eb
|
|
| BLAKE2b-256 |
54420657f731bf4126188f287a0210df8555a8d8a134912ec8e90d57d5d837e0
|
Provenance
The following attestation bundles were made for pycmprsk-1.0.0-py3-none-any.whl:
Publisher:
release.yml on covertcast/pycmprsk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycmprsk-1.0.0-py3-none-any.whl -
Subject digest:
c0ef906f8047cb5a8b6bb009bf7982dfc66702058a1c721181cf3b1dc7b68563 - Sigstore transparency entry: 1614610313
- Sigstore integration time:
-
Permalink:
covertcast/pycmprsk@c20eb6e25ea72f9a42216a44c03152965aa3cfbd -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/covertcast
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c20eb6e25ea72f9a42216a44c03152965aa3cfbd -
Trigger Event:
push
-
Statement type: