Skip to main content

Create sparse and accurate risk scoring systems!

Project description

fasterrisk

docs pypi license Downloads downloads arxiv badge

This repository contains source code to our NeurIPS 2022 paper:

FasterRisk: Fast and Accurate Interpretable Risk Scores

Table of Content

Introduction

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Below is a risk score example created on the 3rd fold of the adult dataset by FasterRisk, predicting salary> 50K.

1. No High School Diploma -4 points ...
2. High School Diploma -2 points + ...
3. Age 22 to 29 -2 points + ...
4. Any Capital Gains 3 points + ...
5. Married 4 points + ...
SCORE =
SCORE -8 -6 -5 -4 -3 -2 -1
RISK 0.1% 0.4% 0.7% 1.2% 2.3% 4.2% 7.6%
SCORE 0 1 2 3 4 5 7
RISK 13.3% 22.3% 34.9% 50.0% 65.1% 77.7% 92.4%

Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow.

We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a "star ray" search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.

Installation

conda create -n FasterRisk python=3.9 # create a virtual environment
conda activate FasterRisk # activate the virtual environment
python -m pip install fasterrisk # pip install the fasterrisk package

Python Usage

Please see the example.ipynb jupyter notebook on GitHub or Example Usage on Read the Docs for a detailed tutorial on how to use FasterRisk in a python environment. The detailed descriptions of key functions can be found in the API Reference on Read the Docs.

There are two major two classes for the users to interact with:

  • RiskScoreOptimizer
sparsity = 5 # produce a risk score model with 5 nonzero coefficients 

# import data
X_train, y_train = ...

# initialize a risk score optimizer
m = RiskScoreOptimizer(X = X_train, y = y_train, k = sparsity)

# perform optimization
m.optimize()

# get all top m solutions from the final diverse pool
arr_multiplier, arr_intercept, arr_coefficients = m.get_models() # get m solutions from the diverse pool; Specifically, arr_multiplier.shape=(m, ), arr_intercept.shape=(m, ), arr_coefficients.shape=(m, p)

# get the first solution from the final diverse pool by passing an optional model_index; models are ranked in order of increasing logistic loss
multiplier, intercept, coefficients = m.get_models(model_index = 0) # get the first solution (smallest logistic loss) from the diverse pool; Specifically, multiplier.shape=(1, ), intercept.shape=(1, ), coefficients.shape=(p, )
  • RiskScoreClassifier
# import data
X_featureNames = ... # X_featureNames is a list of strings, each of which is the feature name

# create a classifier
clf = RiskScoreClassifier(multiplier = multiplier, intercept = intercept, coefficients = coefficients, featureNames = featureNames)

# get the predicted label
y_pred = clf.predict(X = X_train)

# get the probability of predicting y[i] with label +1
y_pred_prob = clf.predict_prob(X = X_train)

# compute the logistic loss
logisticLoss_train = clf.compute_logisticLoss(X = X_train, y = y_train)

# get accuracy and area under the ROC curve (AUC)
acc_train, auc_train = clf.get_acc_and_auc(X = X_train, y = y_train) 

# print the risk score model card
m.print_model_card()

R tutorial

FasterRisk can also be easily used inside R. See the R tutorial on how to apply FasterRisk on an example dataset.

License

fasterrisk was created by Jiachang Liu. It is licensed under the terms of the BSD 3-Clause license.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Credits

fasterrisk was created with cookiecutter and the py-pkgs-cookiecutter template.

Citing Our Work

If you find our work useful in your research, please consider citing the following paper:

@article{liu2022fasterrisk,
  title={FasterRisk: Fast and Accurate Interpretable Risk Scores},
  author={Liu, Jiachang and Zhong, Chudi and Li, Boxuan and Seltzer, Margo and Rudin, Cynthia},
  booktitle={Proceedings of Neural Information Processing Systems},
  year={2022}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasterrisk-0.1.10.tar.gz (478.7 kB view details)

Uploaded Source

Built Distribution

fasterrisk-0.1.10-py3-none-any.whl (478.4 kB view details)

Uploaded Python 3

File details

Details for the file fasterrisk-0.1.10.tar.gz.

File metadata

  • Download URL: fasterrisk-0.1.10.tar.gz
  • Upload date:
  • Size: 478.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/5.14.0-427.18.1.el9_4.x86_64

File hashes

Hashes for fasterrisk-0.1.10.tar.gz
Algorithm Hash digest
SHA256 02e95f5e846c28572d8eabb7e6482c3f3f70982437c9468f1aea4dfa555c0ffc
MD5 6c34b9c401e42f3143371119c1faa859
BLAKE2b-256 844a984b00cde747ec6ac271ed0d776eb264ea01ea9c2d8df685fceb56e292c3

See more details on using hashes here.

File details

Details for the file fasterrisk-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: fasterrisk-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 478.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/5.14.0-427.18.1.el9_4.x86_64

File hashes

Hashes for fasterrisk-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 8984ab39c2f6fc136fd1d4a197b91b769ecabc8faa2e388e4d1fcf3bbbb77d11
MD5 584e86f22316798fc986d1d36683d74c
BLAKE2b-256 8222779d38441a147d5675242ec974e3368b1e2bf04cf37dfe6d1c29d5427078

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page