Improving XGBoost survival analysis with embeddings and debiased estimators
Project description
xgbse
: XGBoost Survival Embeddings
"There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown." - Leo Breiman, Statistical Modeling: The Two Cultures
Survival Analysis is a powerful statistical technique with a wide range of applications such as predictive maintenance, customer churn, credit risk, asset liquidity risk, and others.
However, it has not yet seen widespread adoption in industry, with most implementations embracing one of two cultures:
- models with sound statistical properties, but lacking in expressivess and computational efficiency
- highly efficient and expressive models, but lacking in statistical rigor
xgbse
aims to unite the two cultures in a single package, adding a layer of statistical rigor to the highly expressive and computationally effcient xgboost
survival analysis implementation.
The package offers:
- calibrated and unbiased survival curves with confidence intervals (instead of point predictions)
- great predictive power, competitive to vanilla
xgboost
- efficient, easy to use implementation
- explainability through prototypes
This is a research project by Loft Data Science Team, however we invite the community to contribute. Please help by trying it out, reporting bugs, and letting us know what you think!
Installation
pip install xgbse
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.