Skip to main content

Python port of R's cmprsk: Estimation, testing and regression modeling of subdistribution functions in competing risks.

Project description

pycmprsk

A Python port of the R package cmprsk. Estimation, testing and regression modeling of subdistribution functions in competing risks:

  • cuminc - non-parametric cumulative incidence functions, with Gray's k-sample test across groups and stratification.
  • crr - Fine-Gray subdistribution-hazards regression, with time-fixed covariates (cov1), time-varying covariates (cov2 + tf), per-group censoring weights (cengroup), and the full Huber/White sandwich variance including the q(u) correction for the estimated censoring distribution.
  • predict_crr, summary_crr, timepoints, plot_cuminc, plot_predict - the same downstream API as R.

This package's functionality is numerically validated against R's cmprsk. See Parity testing below.

Install

pip install pycmprsk

The package is pure Python; the hot loops are JIT-compiled with numba.

Quick example

The arrays below were dumped from R using RNGversion("1.6.2"); set.seed(2) and the same data setup as cmprsk/tests/test.R. The resulting plots visually match R's plot(cuminc(...)) and plot(predict(crr(...))).

import matplotlib.pyplot as plt
import numpy as np

from pycmprsk import crr, cuminc, plot_cuminc, plot_predict, predict_crr


def tf_quad(uft):
    """Match R's ``function(uft) cbind(uft, uft^2)``."""
    uft = np.asarray(uft, dtype=np.float64)
    return np.column_stack([uft, uft**2])
Data arrays (Click to expand - Data sourced from R's test suite)
ftime = np.array([
    0.686305, 0.149818, 1.611875, 1.077275, 1.553027, 0.286783, 0.234919, 0.255626, 0.536215, 1.420936,
    1.979941, 0.816767, 0.970783, 3.376077, 1.407218, 0.229477, 2.821243, 1.598966, 0.661166, 0.291716,
    2.421805, 0.264711, 0.419970, 0.994872, 5.248650, 0.493777, 0.036222, 0.039556, 2.225511, 1.896816,
    1.562481, 2.080967, 0.062462, 0.308574, 0.854363, 1.086975, 0.183905, 0.877297, 0.166353, 1.346992,
    3.303843, 0.723761, 0.043173, 1.635107, 1.022373, 1.565542, 0.734400, 1.705071, 1.527256, 1.921497,
    1.854679, 0.310276, 2.424571, 0.515172, 1.251790, 1.054940, 0.010267, 1.079949, 0.136024, 0.466943,
    1.348637, 0.113960, 2.535242, 0.762922, 0.432438, 0.666299, 0.862624, 0.479771, 0.397440, 1.493170,
    0.661091, 0.540539, 1.355944, 0.773167, 3.902563, 0.117417, 1.786273, 0.072698, 0.259388, 2.092709,
    0.229584, 0.490496, 0.425987, 0.335195, 0.697602, 0.097860, 0.917998, 0.174528, 0.680717, 1.835194,
    2.997399, 1.937913, 0.520418, 1.653625, 2.238665, 0.149357, 0.720766, 0.096726, 0.831950, 1.003850,
])

fstatus = np.array([
    1, 2, 0, 2, 2, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 1, 0,
    2, 1, 1, 1, 1, 2, 0, 0, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0, 1,
    2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 2, 1, 0, 2, 0, 2, 1, 2,
    0, 2, 2, 1, 2, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0,
    0, 0, 1, 1, 2, 1, 2, 2, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 1, 1,
])

group_code = np.array([
    3, 1, 3, 1, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 1,
    1, 3, 2, 3, 1, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 3, 2, 1, 3,
    3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2,
    1, 3, 2, 1, 2, 3, 1, 1, 3, 3, 3, 1, 2, 3, 1, 1, 2, 2, 3, 3,
    2, 3, 3, 2, 2, 2, 2, 3, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,
])
group = np.array(["a", "b", "c"])[group_code - 1]

strata = np.array([
    1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1,
    2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
    2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1,
    1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,
    1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
])

cov1 = np.array([
    np.nan, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
    1.0, 1.0, 1.0, np.nan, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, np.nan, 1.0, 1.0,
    1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0,
]).reshape(100, 3)

cov2 = np.column_stack([cov1[:, 0], cov1[:, 0]])
cengroup = cov1[:, 2]
ci = cuminc(ftime, fstatus, group=group, strata=strata)
print("cuminc curve keys:", list(ci.curves.keys()))
print("Gray's k-sample tests (stat, pv, df):\n", ci.tests)

fit = crr(ftime, fstatus, cov1=cov1, cov2=cov2, tf=tf_quad, cengroup=cengroup)
print("crr coefs:", fit.coef)
print("crr converged:", fit.converged)

pred = predict_crr(
    fit,
    cov1=np.array([[1.0, 1.0, 1.0], [0.0, 0.0, 0.0]]),
    cov2=np.array([[1.0, 1.0], [0.0, 0.0]]),
)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_cuminc(ci, ax=ax1)
ax1.set_title("cuminc(ss, cc, gg, strt)")
plot_predict(pred, ax=ax2)
ax2.set_title("predict(crr(ss, cc, cv, cov2, tf, cengroup=cv[,3]))")
plt.tight_layout()
plt.show()

What's different from R's cmprsk

pycmprsk is a port of the functionality, not a verbatim translation of the API:

  • Returns dataclasses (CRRResult, CumincResult, SummaryCRR) rather than R's named lists. Field names use Python snake_case (n_missing vs. R's n.missing, loglik_null vs. loglik.null).
  • tf (the time-varying covariate function) takes and returns NumPy arrays; Python's contract is that it returns shape (ndf, p2) (R's cmprsk wraps 1D output via as.matrix; do the equivalent with .reshape(-1, 1)).
  • na.action is fixed to "omit rows with any NA," matching R's default.

Behavioral parity is the explicit design goal - see src/tests/test_parity.py.

Parity testing

The test suite is 1:1 with R's cmprsk/tests/test.R: every scenario in that file has a corresponding .npz fixture (data + R reference outputs) under src/tests/fixtures/, and one matching Python test per scenario.

To regenerate the fixtures (requires R with the cmprsk and reticulate packages installed):

Rscript src/tests/r_fixtures.R

To run the parity tests:

pytest src/tests

License

pycmprsk is distributed under the GNU General Public License v3.0 or later (GPL-3.0-or-later).

This package is a derivative work of R's cmprsk (Bob Gray), which is licensed under GPL (>= 2). The Fortran sources from cmprsk/src/*.f have been re-implemented in Python while preserving the original algorithms. As a derivative work, pycmprsk must remain GPL-compatible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycmprsk-1.0.0.tar.gz (65.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycmprsk-1.0.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file pycmprsk-1.0.0.tar.gz.

File metadata

  • Download URL: pycmprsk-1.0.0.tar.gz
  • Upload date:
  • Size: 65.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycmprsk-1.0.0.tar.gz
Algorithm Hash digest
SHA256 42907d120bb38ae50ab60fbe05f7e0546ffc6622a1f39f265be10552ad53f229
MD5 8a5a86cb6dc269a15484662bbd4884e3
BLAKE2b-256 c0f69bb93d9c41019071d666e89e93e9eb3ed73e35fa4b035c2a19caf222ca4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycmprsk-1.0.0.tar.gz:

Publisher: release.yml on covertcast/pycmprsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycmprsk-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pycmprsk-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycmprsk-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c0ef906f8047cb5a8b6bb009bf7982dfc66702058a1c721181cf3b1dc7b68563
MD5 8d6fd07e3c0ae9fc2e927ee67df603eb
BLAKE2b-256 54420657f731bf4126188f287a0210df8555a8d8a134912ec8e90d57d5d837e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycmprsk-1.0.0-py3-none-any.whl:

Publisher: release.yml on covertcast/pycmprsk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page