Skip to main content

The most comprehensive Python package for evaluating survival analysis models.

Project description

SurvivalEVAL

PyPI


PyPI PyPI - Python Version License Maintenance HitCount

This python-based package contains the most completeness evaluation methods for Survival Algorithms (see paper). These evaluation metrics can be divided into 3 categories:

Visualization of the evaluation metrics

Concordance index

Concordance index identifies the “comparable” pairs of patients and calculates the percentage of correctly ranked pairs to assess a survival model’s performance. Given two predicted survival curves of a paired patients, it compares the predicted median/mean survival times and marks that pair as correct if the model's prediction about who died first matches with the reality.

Mean Absolute Error

One straightforward metric would be “MAE” – the absolute difference between the actual and predicted survival times (e.g. median of a curve). This requires using the “actual survival time”, which is trivial for uncensored instances, but problematic for censored individuals. This python package implemented MAE loss metrics using different ways of handling censored instances. Here we list three of them:

  1. Uncensored simply discards all the censored individuals and compute the MAE for all the uncensored instances.
  2. Hinge calculates the early prediction error. For a censored instance, if the predicted survival time is smaller than the censored time, then MAE = censor_time - predict_time. If the predicted survival time is equal or larger than the censored time, then MAE = 0. Note that the standard Hinge method requires the Weighted parameter to be set to False.
  3. Pseudo_obs “de-censors” the censored patients, using pseudo-observation method (by estimating the contribution of a censored subject to the whole Kaplan-Meier distribution). Then it calculates the MAE between de-censoring time and the predicted survival time, just like the normal way. Note that the standard Pseudo_obs method requires the Weighted parameter to be set to True.

Mean Squared Error and Root Mean Squared Error

Mean squared error (MSE) is another metric to measure the difference between the actual and predicted survival times. Similar to MAE, mean squared error (MSE) also has multiple ways to handle censored instances, similar to MAE.

  1. Uncensored
  2. Hinge
  3. Pseudo_obs

We also have root mean squared error (RMSE) for each of the different ways.

Integrated Brier Score (IBS)

IBS measures the squared difference between the predicted survival curve with the Heaviside step function of the observed event. IBS can be viewed as the integration of the (single-time) Brier score across all the time points. A smaller IBS value is preferred over the larger value. This python implementation uses IPCW weighting to handle the censored instances. Please refer to Assessment and Comparison of Prognostic Classification Schemes for Survival Data for the detail of IPCW weighting. Please also note that IBS is also similar to the Continuous Ranked Probability Score (CRPS), except (1) the IPCW weighting, and (2) squared error instead of absolute error.

Distribution Calibration (D-calibration)

Haider et al. proposed distribution calibration (D-calibration) test for determining if a model that produces ISDs is meaningful. D-calibration splits the time-axis into a fixed number of intervals and compares the actual number of events with the predicted number of events within each interval. A well D-calibrated model is the one where the predicted number of events within each time interval is statistically similar to the observed number. Models with p-value higher than 0.05 can be considered as well-calibrated model across the survival distribution.

D-calibration quantifies this comparison of predicted and actual events within each time interval. The details of D-calibration calculations and ways to incorporate censored instances into D-calibration computation appear in Appendix B and in Effective Ways to Build and Evaluate Individual Survival Distributions.

Area Under the Receiver Operating Characteristic (AUROC)

AUROC is a metric to measure the performance of a single time probability prediction. It is the area under the receiver operating characteristic curve, which is the plot of the true positive rate against the false positive rate at various threshold settings. In the survival analysis, the single time probability prediction is the prediction of the survival probability at a specific time point. And the true label is whether the patient died at that time point. AUROC excludes the censored instances whose censoring time is earlier than the target time point.

Brier Score (BS)

The Brier score, at a specific time-point, is computed as the mean squared error between the observed event (binary indicator variable) and the predicted event probability at that time-point. It is meaningful in the sense that the square root of the Brier score is the distance between the observed and predicted event on the probability scale.

One-time Calibration (1-calibration)

Calibration measures the confidence of the model. The detailed explanation for the algorithm implementation can be found in Effective Ways to Build and Evaluate Individual Survival Distributions and A tutorial on calibration measurements and calibration models for clinical prediction models. The output is a p-value of Hosmer-Lemeshow goodness-of-fit test at a target time. Models with p-value higher than 0.05 can be considered as well-calibrated model at that time.

Installation

You can install the package via pip.

pip install SurvivalEVAL

Or if you want to do some modification by yourself. Clone the repo, cd into it and install it in editable mode (-e option). That way, these are no more need to re-install the package after modification.

git clone https://github.com/shi-ang/SurvivalEVAL.git
cd SurvivalEVAL
pip install -r requirements.txt
pip install -e . 

Quickstart Example

Install a survival analysis package, such as lifelines, and load the data. Then, you can use the following code to evaluate the model.

from lifelines import CoxPHFitter
from lifelines.datasets import load_rossi

from SurvivalEVAL.Evaluator import LifelinesEvaluator

# Load the data
rossi = load_rossi()
rossi = rossi.sample(frac=1.0)

# Split train/test set
train = rossi.iloc[:300, :]
test = rossi.iloc[300:, :]
train_event_times = train.week.values
train_event_indicators = train.arrest.values
test_event_times = test.week.values
test_event_indicators = test.arrest.values

# Fit the model
cph = CoxPHFitter()
cph.fit(train, duration_col='week', event_col='arrest')

survival_curves = cph.predict_survival_function(test)

# Make the evaluation
eval = LifelinesEvaluator(survival_curves, test_event_times, test_event_indicators,
                          train_event_times, train_event_indicators)

cindex, _, _ = eval.concordance()

mae_score = eval.mae(method="Pseudo_obs")

mse_score = eval.mse(method="Hinge")

# The largest event time is 52. So we use 53 time points (0, 1, ..., 52) to calculate the IBS
ibs = eval.integrated_brier_score(num_points=53, draw_figure=True)

d_cal = eval.d_calibration()

# The target time for the single time probability prediction is set to 25
auc_score = eval.auc(target_time=25)
bs_score = eval.brier_score(target_time=25)
one_cal = eval.one_calibration(target_time=25)

See the Examples for more usage examples.

Expected Deliveries in the Future

  1. Time-dependent c-index (by Antolini)
  2. IPCW AUC

Please create an issue if you want me to implement any other evaluation metrics.

Citing this work

We recommend you use the following to cite SurvivalEVAL in your publications:

@article{qi2024survivaleval,
year = {2024},
month = {01},
pages = {453-457},
title = {{SurvivalEVAL}: A Comprehensive Open-Source Python Package for Evaluating Individual Survival Distributions},
author={Qi, Shi-ang and Sun, Weijie and Greiner, Russell},
volume = {2},
journal = {Proceedings of the AAAI Symposium Series},
doi = {10.1609/aaaiss.v2i1.27713}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SurvivalEVAL-0.2.4.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

SurvivalEVAL-0.2.4-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file SurvivalEVAL-0.2.4.tar.gz.

File metadata

  • Download URL: SurvivalEVAL-0.2.4.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for SurvivalEVAL-0.2.4.tar.gz
Algorithm Hash digest
SHA256 cc9f17bfa265d16ac69ba99ce4ce367238c6382ff1b0af936dd156069c4baffd
MD5 69ac1cf2b841f8624b64286e55655f79
BLAKE2b-256 21caed982a21a679d7526b6c6631305bf132e8fb4fb7814cf555f5e65b428095

See more details on using hashes here.

File details

Details for the file SurvivalEVAL-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: SurvivalEVAL-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for SurvivalEVAL-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9a4f256977eab61db3568685f5b14dde7f756868087dfe48165a85dd6085a2e6
MD5 9a9bffb919b09349a442f884cad054ab
BLAKE2b-256 e0ec38b2763dd24b7de033202cad677a2719463e9f257a0b251d066c6266c717

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page