Create sparse and accurate risk scoring systems!

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

fasterrisk

This repository contains source code to our NeurIPS 2022 paper:

FasterRisk: Fast and Accurate Interpretable Risk Scores

Documentation: https://fasterrisk.readthedocs.io
GitHub: https://github.com/jiachangliu/FasterRisk
PyPI: https://pypi.org/project/fasterrisk/
Free and open source software: BSD license

Table of Content

Introduction
Installation
Python Usage
R tutorial
License
Contributing

Introduction

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Below is a risk score example created on the 3rd fold of the adult dataset by FasterRisk, predicting salary> 50K.


1. No High School Diploma	-4 points	...
2. High School Diploma	-2 points	+ ...
3. Age 22 to 29	-2 points	+ ...
4. Any Capital Gains	3 points	+ ...
5. Married	4 points	+ ...
	SCORE	=


SCORE	-8	-6	-5	-4	-3	-2	-1
RISK	0.1%	0.4%	0.7%	1.2%	2.3%	4.2%	7.6%
SCORE	0	1	2	3	4	5	7
RISK	13.3%	22.3%	34.9%	50.0%	65.1%	77.7%	92.4%

Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow.

We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a "star ray" search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.

Installation

conda create -n FasterRisk python=3.9 # create a virtual environment
conda activate FasterRisk # activate the virtual environment
python -m pip install fasterrisk # pip install the fasterrisk package

Python Usage

Please see the example.ipynb jupyter notebook on GitHub or Example Usage on Read the Docs for a detailed tutorial on how to use FasterRisk in a python environment. The detailed descriptions of key functions can be found in the API Reference on Read the Docs.

There are two major two classes for the users to interact with:

RiskScoreOptimizer

sparsity = 5 # produce a risk score model with 5 nonzero coefficients 

# import data
X_train, y_train = ...

# initialize a risk score optimizer
m = RiskScoreOptimizer(X = X_train, y = y_train, k = sparsity)

# perform optimization
m.optimize()

# get all top m solutions from the final diverse pool
arr_multiplier, arr_intercept, arr_coefficients = m.get_models() # get m solutions from the diverse pool; Specifically, arr_multiplier.shape=(m, ), arr_intercept.shape=(m, ), arr_coefficients.shape=(m, p)

# get the first solution from the final diverse pool by passing an optional model_index; models are ranked in order of increasing logistic loss
multiplier, intercept, coefficients = m.get_models(model_index = 0) # get the first solution (smallest logistic loss) from the diverse pool; Specifically, multiplier.shape=(1, ), intercept.shape=(1, ), coefficients.shape=(p, )

RiskScoreClassifier

# import data
X_featureNames = ... # X_featureNames is a list of strings, each of which is the feature name

# create a classifier
clf = RiskScoreClassifier(multiplier = multiplier, intercept = intercept, coefficients = coefficients, featureNames = featureNames)

# get the predicted label
y_pred = clf.predict(X = X_train)

# get the probability of predicting y[i] with label +1
y_pred_prob = clf.predict_prob(X = X_train)

# compute the logistic loss
logisticLoss_train = clf.compute_logisticLoss(X = X_train, y = y_train)

# get accuracy and area under the ROC curve (AUC)
acc_train, auc_train = clf.get_acc_and_auc(X = X_train, y = y_train) 

# print the risk score model card
m.print_model_card()

R tutorial

FasterRisk can also be easily used inside R. See the R tutorial on how to apply FasterRisk on an example dataset.

License

fasterrisk was created by Jiachang Liu. It is licensed under the terms of the BSD 3-Clause license.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Credits

fasterrisk was created with cookiecutter and the py-pkgs-cookiecutter template.

Citing Our Work

If you find our work useful in your research, please consider citing the following paper:

@article{liu2022fasterrisk,
  title={FasterRisk: Fast and Accurate Interpretable Risk Scores},
  author={Liu, Jiachang and Zhong, Chudi and Li, Boxuan and Seltzer, Margo and Rudin, Cynthia},
  booktitle={Proceedings of Neural Information Processing Systems},
  year={2022}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.7

Apr 3, 2023

0.1.6

Mar 12, 2023

0.1.5

Dec 19, 2022

0.1.2

Oct 23, 2022

0.1.1

Oct 12, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasterrisk-0.1.7.tar.gz (21.0 kB view hashes)

Uploaded Apr 3, 2023 Source

Built Distribution

fasterrisk-0.1.7-py3-none-any.whl (20.9 kB view hashes)

Uploaded Apr 3, 2023 Python 3

Hashes for fasterrisk-0.1.7.tar.gz

Hashes for fasterrisk-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`55e8603ba5412eb2c0b80406f2c223f26370b77b1cf59ea89e98856bdde29748`
MD5	`3d163636d1ede80502c76b656a8ce9a3`
BLAKE2b-256	`e845b8ad4137d0ef10558d6be64727e1e6a2791aebf266bbe18f019df045c412`

Hashes for fasterrisk-0.1.7-py3-none-any.whl

Hashes for fasterrisk-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab933e15ed724f65c91000ec4d38c9679da47a68e719131cc171e00dc86c623b`
MD5	`9d012fb8d063ce2b5f5cbe90f5e0868e`
BLAKE2b-256	`d237a877bc033f4529442dc6ad645866ca5beea98c84959d13da87bf5e99e7ca`