Skip to main content

python survival analysis/survival curves, with a scikit-learn-like api.

Project description

SurvivalPredict

A python package centered around Survival Analysis Statistical Learning, for predicting survival curves. The code in this repo is lovingly without any stochastic generative processes.

See documentation

General walkthrough-demo

Demo for interfacing with scikit learn

Estimators

The estimators implemented in the survivalpredict.estimators sub-module.

Estimators Description Stratifiable Left-censorable
CoxProportionalHazard Cox Proportional Hazards model is a linear semi-parametric relative risk model. A staple of survival analysis. Fast and efficient to train. Survivalpredict's implementation has many optimizations and is faster than other implementations available to Python. Both breslow and efron ties are supported. Currently, only the Breslow base hazard is available. Yes Yes
ParametricDiscreteTimePH A fully parametric linear hazards model. Chen, weibull, log_normal, log_logistic, gompertz, gamma and 'additive chen weibull' baseline hazards are available as hyperparameters. Maximum likelihood is estimated using a survival distinct time likelihood with censorship. Implemented with Pymc/Pytensor, with either a Jax or numba backend. Yes Yes
CoxPHElasticNet Cox Proportional Hazards model model with Elastic-Net/Lasso penalty and feature shrinkage/selection. Uses a 'Newton Raphson-like' coordinate descent algorithm described in Simon, Noah et al. “Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. Assumes breslow ties. The current literature is unclear on how to incorporate stratification support into said algorithm. No Yes
KaplanMeierSurvivalEstimator Univariate non-parametric survival curve. Useful as a baseline/dummy estimator. Accepts strata, but builds a survival curve for each strata. Yes
KNeighborsSurvival K nearest neighbors for survival. An in-memory non-parametric model that builds a Kaplan-Meier survival curve based on neighbors. No Yes
CoxNeuralNetPH A neural network model for estimating relative risk. Cox proportional hazards model's 'negative log likelihood for Breslow ties' is used as a loss function. Breslow's base hazard for relative risk is used to estimate survival across time. Implemented using Jax. Yes Yes
AalenAdditiveHazard Linear multivariate non-parametric estimation of hazard. Allows for each interval of time and feature to have an associated coefficient, allowing for the effects of features to change over time. No Yes

Metrics

Survivalpredict focuses on metrics that directly measure prediction performance. Hence, the survivalpredict.metrics module intentionally excludes metrics based on ranking relative risk(i.e., ' c-index').

Metrics Description
brier_scores_administrative Squared error between the true survival and prediction for each time of interest. Censored intervals are ignored. Averaged by the number of rows not censored at a given interval of time. Ideal in cases of 'administrative' censorship, where 'survival time' is modeled after the time of an individual in the experiment, and not calendar time. This metric is ideal for cases of churn, conversion and operational failure. See here.
integrated_brier_score_administrative Integral of administrative brier scores, to allow for a singular metric of performance.
integrated_brier_score_administrative_sklearn_metric scikit-learn metric wrapper around `integrated_brier_score_administrative` function, for accessing said metric in when using the SklearnSurvivalPipeline wrapper class when interfacing with scikit-learn.
integrated_brier_score_administrative_sklearn_scorer scikit-learn scorer wrapper around `integrated_brier_score_administrative` function, for accessing said metric in when using the SklearnSurvivalPipeline wrapper class when interfacing with scikit-learn.
brier_scores_ipcw Brier scores with inverse probability of censoring weights. The squared error between the true survival and prediction is weighted using a Kaplan-Meier curve with inverted events, depending on censoring and failure at different points in time. This is a common metric within the field of biostatistics and is used in clinical trials. See here.
integrated_brier_score_ipcw Integral of brier scores with probability of censoring weights, to allow for a singular metric of performance.
integrated_brier_score_ipcw_sklearn_metric scikit-learn metric wrapper around `integrated_brier_score_ipcw` function.
integrated_brier_score_ipcw_sklearn_scorer scikit-learn scorer wrapper around `integrated_brier_score_ipcw` function.

Strata Preprocessing

The survivalpredict.strata_preprocessing module allows for the creation of strata to be used various estimators.

Class Description
StrataBuilderDiscretizer Builds strata keys from numeric data. Allows various splitting strategies.
StrataBuilderEncoder Builds strata keys from categorical data.
StrataColumnTransformer Allows various StrataBuilders to be stacked and simultaneously to be run on different columns to build the strata. Modeled after scikit-learn's ColumnTransformer.
make_strata_column_transformer Generates the StrataColumnTransformer class without having to name each transformation directly, like scikit-learn's make_column_transformer.

Pipeline

Due to various reasons, survivalpredict intentionally breaks with scikit-learn's api in several ways. The survivalpredict.pipeline module allows for creating wrappers around various survivalpredict classes, in order for survivalpredict to interpolate with the greater scikit-learn ecosystem (i.e., for feature selection or hyperparameter tuning); in addition of the various utility of a conventional scikit-learn's pipeline.

Class Description
build_sklearn_pipeline_target Builds a singular target array from the times and events arrays. Used as the 'y'/observed for scikit-learn ecosystem.
SklearnSurvivalPipeline Stacks various sklearn transformers and survivalpredict strata_builders and estimators into single class. It assumes the output of the `build_sklearn_pipeline_target` function as the 'y'/observed.
make_sklearn_survival_pipeline Generates a SklearnSurvivalPipeline class without having to directly name all the steps.

Validation

survivalpredict comes with some native model validation capability, within survivalpredict.validation.

Class Description
sur_cross_val_score survivalpredict's equivalent to scikit-learn's cross_val_score.
sur_cross_validate survivalpredict's equivalent to scikit-learn's cross_validate.

Model Selection

Scikit-learn's model_selection is also mimicked within survivalpredict.model_selection

Class Description
Sur_GridSearchCV survivalpredict's equivalent to scikit-learn's GridSearchCV
Sur_RandomizedSearchCV survivalpredict's equivalent to scikit-learn's RandomizedSearchCV

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

survivalpredict-0.0.1-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file survivalpredict-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for survivalpredict-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 51c7d292f83fe7c19b2451f1a4388ccaa54b0f4b47f8349528107f69e638ea4c
MD5 805aafe3e54ba678ef584a2fe886624a
BLAKE2b-256 96b7e4740bef5bdfa21fa2ca99f1a1b98f2a9dc7308dd68ef669ebd88928fdca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page