Health-centered variation analysis
Project description
fairMLHealth
Tools and tutorials for variation analysis in healthcare machine learning.
New in Version 1.0.0!
We reorganized the library to make things more intuitive and added some useful capabilities:
- Evaluate measures for regression analysis
- Stack stratified tables by cohort groups
- Apply a single compare function to evaluate one or multiple models
Tool Contents
-
Documentation and References
-
Examples and Tutorials
- Tutorials for measuring and analyzing fairness as it applies to machine learning
- Examples for using the templates and tools
-
FairMLHealth
- Measure:
- Measure healthcare-specific fairness metrics for classification and regression models.
- Compare:
- Compare protected attributes and other features of a model or several models for potential biases in outcomes.
- Visualize:
- Visualize biases across multiple models through helpful reports & customizable highlights.
- Measure:
-
Templates
- Quickstart notebooks that serve as skeletons for your model analysis
Installation
Installing with pip:
python -m pip install fairmlhealth
Installing directly from GitHub:
python -m pip install git+https://github.com/KenSciResearch/fairMLHealth
Installing from a local copy of the repo:
pip install <path_to_fairMLHealth_dir>
Troubleshooting Installation Issues
Trouble with the Installation? - Step 1: Verify that you're using a compatible version of python. - Step 2: If step 1 does not resolve your issue, verify that all required packages are properly installed. - Step 3: For some metrics, FairMLHealth relies on AIF360, which has a few known installation gotchas. If you are having trouble with your installation, first check AIF360's Troubleshooting Tips.
If you are not able to resolve your issue through the troubleshooting tips above, please let us know through the Discussion Board or by submitting an issue using the Issue Template found in our Documentation folder.
FairMLHealth Usage
Below are some quickstart examples of our most popular features. More information about these and other examples can be found in our examples_and_tutorials folder! These specific examples are based on our ToolUsage notebooks, for which we've provided online access in Jupyter's nbviewer via the following links:
Note that while the examples below use pandas, the library is designed to accept either pandas objects or numpy arrays.
Example Setup
from fairmlhealth import report, measure
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.tree import DecisionTreeClassifier
# First we'll create a semi-randomized dataframe with specific columns for our attributes of interest
rng = np.random.RandomState(506)
N = 240
X = pd.DataFrame({'col1': rng.randint(1, 4, N),
'col2': rng.randint(1, 75, N),
'col3': rng.randint(0, 2, N),
'gender': [0, 1]*int(N/2),
'ethnicity': [1, 1, 0, 0]*int(N/4),
'other': [1, 0, 1, 0, 1, 1, 0, 1]*int(N/8)
})
Generalized Reports - report.py
As in previous versions, the latest version of fairMLHealth has tools to create generalized reports of model bias and performance. An important update starting in Version 1.0.0 is that all of these features are now contained in the report.py module (previously named model_comparison.py).
Comparing Binary Classifications
The primary reporting tool is now the compare function, which can be used to generate side-by-side comparisons for any number of models, and for either binary classifcation or for regression problems. Model performance metrics such as accuracy and precision (or MAE and RSquared for regression problems) are also provided to facilitate comparison. Below is an example output comparing the two example models defined above.
A flagging protocol is applied by default to highlight any cells with values that are out of range. This can be turned off by passing flag_oor = False to report.compare().
# Ceate a randomized target variable
y = pd.Series(X['col3'].values + rng.randint(0, 2, N), name='Example_Target').clip(upper=1)
# Third, we'll split the data and use it to train two generic models
splits = train_test_split(X, y, stratify=y, test_size=0.5, random_state=60)
X_train, X_test, y_train, y_test = splits
model_1 = BernoulliNB().fit(X_train, y_train)
model_2 = DecisionTreeClassifier().fit(X_train, y_train)
# Generate the report
fairness_measures = report.compare(X_test, y_test, X_test['gender', model_1)
Note that the Equal Odds Ratio has been dropped from the example below. This because the false positive rate is approximately zero for both the entire dataset and for the privileged class, leading to a zero in the denominator of the False Positive Rate Ratio. The result is therefore undefined and cannot be compared in the Equal Odds Ratio.
The compare tool can also be used to measure two different models or two different protected attributes. Protected attributes are measured separately and cannot yet be combined together with the compare tool, although they can be grouped as cohorts in the stratified tables as shown below.
# Example with multiple models
report.compare(test_data = X_test,
targets = y_test,
protected_attr = X_test['gender'],
models = {'Any Name 1':model_1, 'Model 2':model_2})
# Example with different protected attributes.
# Note that the same model is passed with two different keys to clarify the column names.
report.compare(X_test, y_test,
[X_test['gender'], X_test['ethnicity']],
{'Gender':model_1, 'Ethnicity':model_1})
Comparing Regressions
Here is an example applying the same function for a regression model. Note that the "fair" range to be used for evaluation of regression metrics does requires judgement on the part of the user. Default ranges have been set to [0.8, 1.2] for ratios, 10% of the available target range for Mean Prediction Difference, and 10% of the available MAE range for MAE Difference. If the default flags do not meet your needs, they can be turned off by passing flag_oor = False to report.compare(). More information is available in our Evaluating Fairness Documentation.
from sklearn.linear_model import LinearRegression
# Create a randomized target variable. In this case we'll add some correlation with existing variables
y = pd.Series((X['col3']+X['gender']).values + rng.uniform(0, 6, N), name='Example_Continuous_Target')
# Split the data and use it to train a regression model
splits = train_test_split(X, y, test_size=0.5, random_state=42)
X_train, X_test, y_train, y_test = splits
regression_model = LinearRegression().fit(X_train, y_train)
# Generate the report.
# Note that for regression models, the prediction type (pred_type) must be declared as such.
report.compare(X_test, y_test, X_test['gender'], regression_model, pred_type="regression")
# Display the same report with no flags and no model performance
report.compare(X_test, y_test, X_test['gender'], regression_model, pred_type="regression",
flag_oor=False, skip_performance=True))
Detailed Analyses - measure.py
FairMLHealth also provides tools for detailed analysis of model variance by way of stratified data, performance, and bias tables. Beyond evaluating fairness, these tools are intended for flexible use in any generic assessment of model bais to evaluate multiple features at once. An important update starting in Version 1.0.0 is that all of these features are now contained in the measure.py module (previously named reports.py).
All tables display a summary row for "All Features, All Values". This summary can be turned off by passing add_overview=False to measure.data().
Stratified Data Tables
The stratified data table can be used to evaluate data against one or multiple targets. Two methods are available for identifying which features to assess, as shown in the first example below.
# The following two function calls will produce the same output table shown below
# Arguments Option 1: pass full set of data, subsetting with *features* argument
measure.data(X_test, y_test, features=['gender'])
# Arguments Option 2: pass the data subset of interest without using the *features* argument
measure.data(X_test[['gender']], y_test)
# Display a similar report for multiple targets, dropping the summary row
measure.data(X=X_test, # used to define rows
Y=X_test, # used to define columns
features=['gender', 'col1'], # optional subset of X
targets=['col2', 'col3'], # optional subset of Y
add_overview=False # turns off "All Features, All Values" row
)
Stratified Performance Tables
The stratified performance table evaluates model performance specific to each feature-value subset. These tables are compatible with both classification and regression models. For classification models with the predict_proba() method, additional ROC_AUC and PR_AUC values will be included if possible.
# Binary classification performance table with probabilities included
measure.performance(X_test[['gender']],
y_true=y_test,
y_pred=model_1.predict(X_test),
y_prob=model_1.predict_proba(X_test)[:,1])
# Regression example
measure.performance(X_test[['gender']],
y_true=y_test,
y_pred=regression_model.predict(X_test),
pred_type="regression")
Stratified Bias Tables
The stratified bias analysis table apply fairness-related metrics for each feature-value pair. It assumes a given feature-value as the "privileged" group relative to all other possible values for the feature. For example, row 2 in the table below displays measures for "col1" with a value of "2". For this row, "2" is considered to be the privileged group, while all other non-null values (namely "1" and "3") are considered unprivileged.
To simplify the table, fairness measures have been reduced to their component parts. For example, the Equal Odds Ratio has been reduced to the True Positive Rate (TPR) Ratio and False Positive Rate (FPR) Ratio.
# Binary classification example
# Note that flag_oor is set to False by default for this feature
measure.bias(X_test[['gender', 'col1']], y_test, model_1.predict(X_test))
Note that the flag function is compatible with both measure.bias() and measure.summary() (which is demonstrated in the next section). However, to enable colored cells the tool returns a pandas Styler rather than a DataTable. For this reason, flag_oor is set to False by default (as shown in the example above). Flagging can be turned on by passing flag_oor=True to either function. As an added feature, optional custom ranges can be passed to either measure.bias() or measure.summary() to facilitate regression evaluation, shown in the example below.
# Custom "fair" ranges may be passed as dictionaries of tuples whose keys are case-insensive measure names
my_ranges = {'MAE Difference':(-0.1, 0.1), 'mean prediction difference':(-2, 2)}
# Note that flag_oor is set to False by default for this feature
measure.bias(X_test[['gender', 'col1']],
y_test,
regression_model.predict(X_test),
pred_type="regression",
flag_oor=True,
custom_ranges=my_ranges)
Summary Table
The measure module also contains a summary function that works similarly to report.compare(). While it can only be applied to one model at a time, it can accept custom "fair" ranges, and accept cohort groups as will be shown in the next section.
# Example summary output for the regression model with custom ranges
measure.summary(X_test[['gender', 'col1']],
y_test,
regression_model.predict(X_test),
prtc_attr=X_test['gender'],
pred_type="regression",
flag_oor=True,
custom_ranges={ 'mean prediction difference':(-0.5, 2)})
Analysis by Cohorts
All of the tables generated by measure.py can be passed an optional cohort_labels argument specifying additional labels for each observation by which the analysis should be grouped. Cohorts for which there is insufficient data to run the analysis are simply skipped.
# Example of cohorts applied to bias function
cohort_labels = X_test['gender']
measure.bias(X_test['col3'], y_test, model_1.predict(X_test),
flag_oor=True, cohort_labels=cohort_labels)
# Example of cohorts applied to summary function
# Note that performance measures and flagging are turned off
measure.summary(X_test[['col2']],
y_test,
model_1.predict(X_test),
prtc_attr=X_test['gender'],
pred_type="classification",
flag_oor=False,
skip_performance=True,
cohort_labels=X_test[['ethnicity', 'col3']]
)
Additional Library Resources
More information about these and other examples can be found in our examples_and_tutorials folder! These specific examples are based on our ToolUsage notebooks, for which we've provided online access in Jupyter's nbviewer via the following links:
For a deep discussion of fairness evaluation, see Evaluating Fairness in our documentation and resources section. In the same folder you'll find a Measures QuickReference, plus additional References and Resources
For an active notebook demonstrating the fairness evaluation process, see the Tutorial for Evaluating Fairness in Binary Classification and Tutorial for Evaluating Fairness in Regression (nbviewer links), the notebooks for which are located in our examples_and_tutorials. These are best used with our ICHI2021 FairnessInHealthcareML Slides.pdf, which can be found in the [Publications]((./docs/publications/) folder.
Templates are available in the templates folder:
Connect with Us!
This is a work in progress. By making this information as accessible as possible, we hope to promote an industry based on equity and empathy. But building that industry takes time, and it takes the support of the community. Please connect with us so that we can support each other to advance machine learning and healthcare!
- For problems with the source code or documentation, please submit inquiries using our Issue Template or Feature Request Template through GitHub's Issue Tracker.
- Other comments, ideas, inquiries, suggestions, feedback and requests are welcome through the Discussion Page.
- See the Contributing Guidelines for more information.
Citations
Repository
Allen, C., Ahmad, M.A., Eckert, C., Hu, J., Kumar, V. , & Teredesai, A. (2020). fairML-Health: Tools and tutorials for fairness evaluation in healthcare machine learning. https://github.com/KenSciResearch/fairMLHealth.
@misc{fairMLHealth,
title={fairMLHealth: Tools and tutorials for fairness evaluation in healthcare machine learning.},
author={Allen, C. and Ahmad, M.A. and Eckert, C. and Hu, J. and Kumar, V. and Teredesai, A.},
year={2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/KenSciResearch/fairMLHealth}}
}
KDD2020 Tutorial Presentation
Ahmad, M.A., Patel, A., Eckert, C., Kumar, V., Allen, C. & Teredesai, A. (2020, August). Fairness in Machine Learning for Healthcare. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3529-3530).
See also: Publications
@incollection{FMLH_KDD2020,
title = {Fairness in Machine Learning for Healthcare},
author = {Ahmad, M.A. and Eckert, C. and Kumar, V. and Patel, A. and Allen, C. and Teredesai, A.},
year = 2020,
month = {August},
booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
pages = {3529--3530}
}
Courses that use FairMLHealth
- TCSS 593: Research Seminar In Data Science Spring 2021 Department of Computer Science, University of Washington Tacoma
- EE 520: Predictive Learning From Data Spring 2021 Department of Electrical Engineering, University of Washington Bothell
- CSS 581: Machine Learning Autumn 2020 Department of Computer Science, University of Washington Bothell
Key Contributors
- Muhammad Aurangzeb Ahmad
- Christine Allen
- Carly Eckert
- Juhua Hu
- Vikas Kumar
- Arpit Patel
- Ankur Teredesai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fairmlhealth-1.0.2.tar.gz
.
File metadata
- Download URL: fairmlhealth-1.0.2.tar.gz
- Upload date:
- Size: 59.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a57cfc6304e1e5702b6e834f8131bada9bad55a6c319d61cfbf8215874a2c49 |
|
MD5 | 2831562f8ea7905893d2147633b7f88f |
|
BLAKE2b-256 | ba6084a6120870b09c8dad1b7d9a97628a9b0a100f7d3b4f2ec538a5098d6c47 |
File details
Details for the file fairmlhealth-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: fairmlhealth-1.0.2-py3-none-any.whl
- Upload date:
- Size: 60.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0691be7781150c6762569ab39d5303e3e32edf0a9f950917c8132aa63c14422 |
|
MD5 | fad6a5ce0a965820edb793c945b08a80 |
|
BLAKE2b-256 | f554fd7c80ecf32bf39fd6b703e33c0d5c146ecbc400cbb9624e37b7ac58a3c5 |