Multicalibration post-processing in python.
Project description
Multicalibration Post-Processing Python Package
Multicalibration is a Python package that implements a model post-processing method of the same name. The goal of multicalibration post-processing algorithms are to improve the calibration of a model not only overall, but also on specified subpopulations (or "groups"/"subgroups") given as input. Multicalibration originated in the field of algorithmic fairness, and was suggested in order to provide better performance of machine learning models on protected subpopulations of the data. This package provides implementations of two multicalibration algorithms: HKRR (from the original multicalibration paper), and HJZ.
The package can be installed via pip:
pip install multicalibration
The package can also be installed by cloning the git repository:
git clone https://github.com/sid-devic/multicalibration.git
cd multicalibration/
pip install .
Example Usage
Multicalibration post-processing takes as input a set of probabilistic predictions, true labels for those predictions, and a list of subgroups.
Importantly, datapoints may belong to multiple subgroups: that is, subgroups may be very complex and overlapping.
In examples/basic_usage.py
, we give a short example of applying the HKRR algorithm on some synthetic data, summarized here.
# Generate some synthetic data
probs, labels, subgroups = generate_correlated_subgroup_data(n_samples=1000)
n_groups = len(subgroups)
# Hyperparams for HKRR predictor
hkrr_params = {
'alpha': 0.1,
'lambda': 0.01,
'max_iter': 100,
'randomized': True,
'use_oracle': False,
}
# Initialize and fit HKRR predictor
mcb = MulticalibrationPredictor('HKRR')
mcb.fit(probs, labels, subgroups, hkrr_params)
# Make predictions using HKRR
hkrr_probs = mcb.predict(probs, subgroups)
In the above code, probs
is a length n
array of a model's probabilistic predictions (e.g., prob[i]
gives confidence in [0,1] that the model believes datapoint i
should be classified with label 1
).
The labels
array is a length n
binary array with the true labels of each datapoint.
Most importantly, subgroups
is a length n_groups
(number of subgroups) array, where each index subgroups[j]
gives all indices of datapoints i
which belong to subgroup j
.
For example, if there were three datapoints and two groups, the following subgroups
array would represent that the first two datapoints (i=0,1
) belong to group one, and the second two datapoints (i=1,2
) belong to group two:
subgroups = [[0, 1], [1, 2]]
Multicalibration post-processing algorithms usually also take in a number of hyperparameters. One may want to select these hyperparameters using a hold-out validation set. We demonstrate a potential way of doing this in the file examples/hyperparameter_search.py
.
This file also serves as an example of applying multicailbration on real data from the US Census via the folktables package.
Cite
This package was developed as part of our work on the empirical aspects of multicalibration post-processing algorithms. We ask that you consider citing our paper here:
@article{hansen2024multicalibration,
title={When is Multicalibration Post-Processing Necessary?},
author={Hansen, Dutch and Devic, Siddartha and Nakkiran, Preetum and Sharan, Vatsal},
journal={Advances in Neural Information Processing Systems (Neurips)},
year={2024}
}
If you use the HKRR algorithm, we encourage you to cite the original multicalibration paper:
@inproceedings{hebert2018multicalibration,
title={Multicalibration: Calibration for the (computationally-identifiable) masses},
author={H{\'e}bert-Johnson, Ursula and Kim, Michael and Reingold, Omer and Rothblum, Guy},
booktitle={International Conference on Machine Learning},
pages={1939--1948},
year={2018},
organization={PMLR}
}
Finally, if you used the HJZ algorithm, please cite the authors work:
@article{haghtalab2024unifying,
title={A unifying perspective on multi-calibration: Game dynamics for multi-objective learning},
author={Haghtalab, Nika and Jordan, Michael and Zhao, Eric},
journal={Advances in Neural Information Processing Systems (Neurips)},
year={2023}
}
Acknowledgements and License
This repository is under the MIT license. Most of the original implementation work was done by Dutch Hansen as an undergraduate research assistant at the University of Southern California. This repository also uses Eric Zhao's implemention of HJZ from the original paper, found here (also on the MIT license). We sincerely thank Eric for help debugging and implementing the algorithm. The HKRR implementation is based in part on the implementation found here. We thank the original authors Saina Asani, Sana Tonekaboni, and Shuja Khalid. Unfortunately, we were not able to find a license for their work.
Contribute
Thank you for your interest!
We plan to slowly incorporate additional multicalibration-style algorithms into this package, as we believe that practitioners stand to benefit from accessible implementations of these algorithms.
If you would like to help on this effort, please contact devic[at]usc.edu
. You may also want to install the package in editable mode via
git clone https://github.com/sid-devic/multicalibration.git
cd multicalibration/
pip install -e .
This will allow you to modify the source files without re-installing the package each time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file multicalibration-0.0.2.tar.gz
.
File metadata
- Download URL: multicalibration-0.0.2.tar.gz
- Upload date:
- Size: 13.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e57aa15fb840d434635c4bb979a0fc0cb247140de597afa4110206e585419fc9 |
|
MD5 | f29d1b5d867443c9a36579a1582434ee |
|
BLAKE2b-256 | 91d860d2ec58ef154b641c5d9fbd45d02cd0dff11aa9b2a900713c317a18b0ba |
File details
Details for the file multicalibration-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: multicalibration-0.0.2-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0256a31e7c05a128d4d82b19b77f396e8698269bfe8739ab9103c5e44ae212b9 |
|
MD5 | 89d26f9ae75c2a9930b876264e93d846 |
|
BLAKE2b-256 | 91a9b081573baf41ad738d09e6fdda68de80889ffc8395a22e36dbee2f2ed74a |