paranet package
Project description
paranet: Parametric survival models with elastic net regularization
The paranet
package allows for the fitting of parametric survival models with rightcensoring that have an L1 & L2 regularization penalty (elastic net) in python
with an sklearn
like syntax. There are three parametric distributions which currently supported:
 Exponential
 Weibull
 Gompertz
These distributions were chosen due to the frequency of use in empirical research and the range of hazard distributions they can model. Additionally, they samples can be drawn cheaply using the inverse method allowing quantile and random data generation methods to run in a trivial amount of time.
Parametric model support currently exists with the X and Y packages, but none of these packages support regularization. Elastic net survival models can be fit with Z, but this is only for the CoxPH model. While the Cox model is a very important tool for survival modelling, its key limitation for largescale datasets is that i) it is unable to easily do inference on individual survival times, and ii) its loss function is nonparametric and has a runtime that grows O(n^2) rather than O(n).
The paranet
package allows users to fit a highdimensional linear model on rightcensored data and then provide individualized or group predictions on timetoevent outcomes. Fitting a parametric model on customer churn data can allow a data science to answer interesting questions such as: "out of these 100 customers, when do we first expect that 10% of them will have churned?", or "for this new customer, at what time point are they are highest risk of leaving us (i.e. maximum hazard)?", or "for an existing customer, what is the probability they will have churned in 10 months from now?".
(0) Installation
The paranet
package can be installed with pip install paranet=0.1
. NOTE this package has been tested with python 3.9+. Using earlier versions of python may lead to errors.
(1) Basic syntax
The parametric
class is the workhouse model of this package. When initializing the model users will always need to the specify the dist
argument. This can be a list or a string of valid distribution types. There are no limits on how long this list can be, but if it is of length k
, then subsequent time measurements will either need to a vector or a matrix with k
columns.
Although each input argument is defined in the docstrings, several parameters will recur frequently throughout and are defined here for convenience.
x
: An(n,p)
array of covariates. Users can either manually add an intercept and scale the data, or setadd_int=True
andscale_x=True
.t
: An(n,k)
array of time measurements that should be nonzero. If $k\geq 0$ then model assumes each column corresponds to a (potentially) different distribution.d
: An(n,k)
array of censoring indicators whose values should be either zero or one. As is the usual convention, 0 corresponds to a censored observation and 1 to an uncensored one.gamma
: The strength of the regularization (see section (2) for a further description). If this variable is a vector or a matrix, it must correspond to the number of columns ofx
(including the intercept if it is given).rho
: The relative L1/L2 strength, where 0 corresponds to L2only (i.e. Ridge regression) and 1 corresponds to L1only (i.e. Lasso).
As a rule paranet
will try to broadcast where possible. For example, if the time measurement array is t.shape==(n,k)
, and dist=='weibull'
then it will assume that each column of t
is a Weibull distribution. In contrast, if t.shape==(n,)
and dist=['weibull','gompertz']
, it broadcast copies of t
for each distribution.
The parametric
class has X keys methods. If x
, t
, or d
are initialized then arguments can be left empty.
fit(x, t, d, gamma, rho)
: Will fit the elastic net model for a givengamma
/rho
penalty and enable methods likehazard
to be executed.find_lambda_max(x, t, d, gamma)
: Uses the KKT conditions of the subgradient to the determine the largestgamma
that is needed to zeroout all covariates except the scale and shape parameters.{hazard,survival,pdf}(t, x)
: Similar topredict
issklearn
, these methods provide individualized estimates of the hazard, survival, and density functions.quantile(percentile, x)
: Provides the quantile of the individualized survival distribution.rvs(n_sim, censoring)
: Generates a certain number of samples for a censoring target.
When initializing the parametric
class, users can include the data matrices which will be saved for later for methods that require them. However, specifying these arguments in later methods will always override (but not replace) these inherited attributes.
dist
: Required argument that is a string or a list whose elements must be one of: exponential, weibull, or gompertz.alpha
: The shape parameter can be manually defined in advance (needs to match the dimensionality ofdist
).beta
: The scale parameter can be defined in advance (needs to match the dimensionality ofdist
andx
).scale_x
: Will standardize the covariates to have a mean of zero and a variance of one. This is highly recommended when using any regularization. If this argument is set to True, always provide the raw form of the covariates as they will be scaled during inference.scale_t
: Will normalize the time vector be a maximum of one for model fitting which can help with overflow issues. During inference, the output will always be returned to the original scale. However, the coefficients will change as a result of this.
(2) Toy example
The code block below shows how to fit three parametric distributions to a single array of data generated by covariates. For other demonstrations of using paranet
with a jupyter notebook see the examples folder.
# Load modules
import numpy as np
import pandas as pd
import plotnine as pn
from scipy import stats
from paranet.models import parametric
# (i) Create a toy dataset
n, p, seed = 100, 5, 3
x = stats.norm().rvs([n,p],seed)
shape = 2
b0 = 0.25
beta = stats.norm(scale=0.5).rvs([p,1],seed)
eta = x.dot(beta).flatten() + b0
scale = np.exp(eta)
t = (np.log(stats.uniform().rvs(n,seed))/scale)**(1/shape)
d = np.ones(n)
# (ii) Fit the (unregularized) model
mdl = parametric(dist=['exponential', 'weibull', 'gompertz'], x=x, t=t, d=d, scale_x=False, scale_t=False)
mdl.fit()
# (iii) Plot the individual survival, hazard, and density functions for five "new" observations
n_points = 100
n_new = 4
t_range = np.exp(np.linspace(np.log(0.25), np.log(t.max()), n_points))
x_new = stats.norm().rvs([n_new,p],seed)
# We can at look at the hazard for first outofsample individual
# Notice that for the exponential distribution (first column) the hazard is independent of time which is as expected
print(np.log(mdl.hazard(t_range, np.tile(x_new[[0]],[n_points,1]))).round(2))
# We can then comprehensively calculate this for each method
methods = ['hazard', 'survival', 'pdf']
holder = []
for j in range(n_new):
x_j = np.tile(x_new[[j]],[n_points,1])
for method in methods:
res_j = getattr(mdl, method)(t_range, x_j)
if method == 'hazard':
res_j = np.log(res_j)
res_j = pd.DataFrame(res_j, columns = mdl.dist).assign(time=t_range,method=method, sample=j+1)
holder.append(res_j)
# Plot the results
res = pd.concat(holder).melt(['sample','time','method'],None,'dist')
gg_res = (pn.ggplot(res, pn.aes(x='time', y='value', color='dist')) +
pn.theme_bw() + pn.geom_line() +
pn.scale_color_discrete(name='Distribution') +
pn.facet_grid('method~sample',scales='free',labeller=pn.labeller(sample=pn.label_both)))
(3) Probability distribution parameterization
Each parametric survival distribution is defined by a scale $\lambda$ and, except for the Exponential distribution, a shape $\alpha$ parameter. Each distribution has been parameterized so that a higher value of the scale parameter indicates a higher "risk". The density functions are shown below. The scale and shape parameters must also be positive, except for the case of the Gompertz distribution where the shape parameter can be positive or negative.
$$ \begin{align*} f(t;\lambda, \alpha) &= \begin{cases} \lambda \exp{ \lambda t } & \text{ if Exponential} \ \alpha \lambda t^{\alpha1} \exp{ \lambda t^{\alpha} } & \text{ if Weibull} \ \lambda \exp{ \alpha t } \exp{ \frac{\lambda}{\alpha}(e^{\alpha t}  1) } & \text{ if Gompertz} \ \end{cases} \end{align*} $$
When moving from the univariate to the multivariate distribution, we assume that scale parameter takes is an exponential transform (to ensure positivity) of a linear combination of parameters ($\eta$). Optimization occurs by balancing the data likelihood with the magnitude of the coefficients ($R$), as is shown below.
$$ \begin{align*} \lambda_i &= \exp\Big( \beta_0 + \sum_{j=1}^p x_{ij}\beta_j \Big) \ R(\beta;\gamma,\rho) &= \gamma\big(\rho  \beta_{1:} 1 + 0.5(1\rho)\beta{1:}2^2\big) \ \ell(\alpha,\beta,\gamma,\rho) &= \begin{cases} n^{1}\sum{i=1}^n\delta_i\log\lambda_i  \lambda_i t_i + R(\beta;\gamma,\rho) & \text{ if Exponential} \ n^{1}\sum_{i=1}^n\delta_i[\log(\alpha\lambda_i)+(\alpha1)\log t_i]  \lambda t_i^\alpha + R(\beta;\gamma,\rho) & \text{ if Weibull} \ n^{1}\sum_{i=1}^n\delta_i[\log\lambda + \alpha t]  \frac{\lambda}{\alpha}(\exp{\alpha t_i } 1) + R(\beta;\gamma,\rho) & \text{ if Gompertz} \ \end{cases} \end{align*} $$
(4) How is censoring calculated?
When calling the parametric.rvs
method, the user can specify the censoring value. In paranet
, censoring is generated by an exponential distribution taking on a value that is smaller than the actual value. Formally:
$$ \begin{align*} T^{\text{obs}} &= \begin{cases} T^{\text{act}} & \text{ if } T^{\text{act}} < C \ C & \text{ otherwise} \end{cases} \ C &\sim \text{Exp}(\lambda_C) \ \end{align*} $$
There are of course other processes that could generate censoring (such as EXAMPLE). The reason an exponential distribution is used in the censoring process is to allow for a (relatively) simple optimization problem of finding a single scale parameter ($\lambda_C$), which obtains an (asymptotic) censoring probability of $\phi$:
$$ \begin{align*} \phi(\lambda_C) &= P(C \leq T_i) = \int_0^\infty f_T(u) F_C(u) du, \ \lambda_C^* &= \arg\min_\lambda  \phi(\lambda)  \phi^* _2^2, \end{align*} $$
Where $F_C(u)$ is the CDF of an exponential distribution with $\lambda_C$ as the scale parameter, and $f_T(u)$ is the density of the target distribution (e.g. a Weibullpdf). Finding the scale parameter amounts to a rootfinding problem that can be carried out with scipy
. Finding a single scale parameter is more complicated for the multivariate case because an assumption needs to be made about the distribution of $\lambda_i$ itself, which is random. While it is tempting to generate a censoringspecific distribution (i.e. $C_i$) this would break the noninformative censoring assumption since the censoring random variable is now a function of the realized risk scores. The paranet
package assumes that the covariates come from a standard normal distribution: $x_{ij} \sim N(0,1)$ so that $\eta_i \sim N(0, \beta^2_2)$, and $\lambda_i \sim \text{Lognormal}(0, \beta^2_2)$. It is important that the data be at least normalized for this assumption to be credible.
$$ \begin{align*} P(C \leq T) &= \int_0^\infty \Bigg( \int_0^\infty P(C \leq T_i) di \Bigg) F_C(u) du \ &= \int_0^\infty\int_0^\infty F_C(u)f_{i}(u) f_\lambda(i) du di , \end{align*} $$
Where $f_{i}(u)$ is the density of the target distribution evaluated at $u$, whereas $f_\lambda(i)$ is the pdf of a lognormal distribution evaluated at $i$. This is a much more complicated integral to solve and paranet
currently uses a bruteforce approach at integrating over a grid of values rather than using double quadrature as the latter approach was shown to be prohibitively expensive in terms of runtime.
(5) How does optimization happen?
Unlike glmnet
, the paranet
packages does not use coordinate descent (CD). Instead, this packages uses a smooth approximation of the L1norm to allow for direct optimization with scipy
as shown below. Parametric survival models are not easily amenable to the iterativelyreweighted least squares (IRLS) approach used by glmnet
, because of the presence of the shape parameter. In contract, an exponential model can be easily fit leveraging existing CDbased elastic net solvers. Moving to proximal gradient descent would enable for direct optimization of the L1norm loss and represents a possible future release.
$$ \begin{align*} R(\beta;\gamma,\rho) &= \gamma\big(\rho  \beta_{1:} 1 + 0.5(1\rho)\beta{1:}_2^2\big) \ \beta &\approx \sqrt{\beta^2 + \epsilon} \ \frac{\partial R}{\partial \beta} &\approx \gamma\Bigg(\rho \frac{\beta}{\sqrt{\beta^2+\epsilon}} + (1\rho)\beta\Bigg) \end{align*} $$
(6) Making contributions
If you are interested in making contributions to this package feel free to email me or make a pull request. The main classes and functions of this package have received significant unit testing and to ensure that changes do not break the package, it is recommended running source tests/run_pytest_{univariate,multivariate,elnet}.sh
before making any final merges. This package was built with a specific conda environment and developers can use conda env create f paranet.yml
and then conda activate paranet
.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Built Distribution
Hashes for paranet0.1.4py2.py3noneany.whl
Algorithm  Hash digest  

SHA256  53b219b8a7b3ddab5d05c70c472ae9d7af941aad7356505c6460042564d109ed 

MD5  7453776edcfd9ad36c54866816ab5d2a 

BLAKE2256  d1576e617c0937ff251f54af47333215f8c589a1502705e7cea774c9242bc92a 