Tools to generate concice high-quality summaries of a probability distribution
Project description
GoodPoints
A Python package for generating concise, high-quality summaries of a probability distribution
GoodPoints is a collection of tools for compressing a distribution more effectively than independent sampling:
- Given an initial summary of n input points, kernel thinning returns s << n output points with comparable integration error across a reproducing kernel Hilbert space
- Compress++ reduces the runtime of generic thinning algorithms with minimal loss in accuracy
Installation
To install the goodpoints
package, use the following pip command:
pip install goodpoints
Getting started
The primary kernel thinning function is thin
in the kt
module:
from goodpoints import kt
coreset = kt.thin(X, m, split_kernel, swap_kernel, delta=0.5, seed=123, store_K=False)
"""Returns kernel thinning coreset of size floor(n/2^m) as row indices into X
Args:
X: Input sequence of sample points with shape (n, d)
m: Number of halving rounds
split_kernel: Kernel function used by KT-SPLIT (typically a square-root kernel, krt);
split_kernel(y,X) returns array of kernel evaluations between y and each row of X
swap_kernel: Kernel function used by KT-SWAP (typically the target kernel, k);
swap_kernel(y,X) returns array of kernel evaluations between y and each row of X
delta: Run KT-SPLIT with constant failure probabilities delta_i = delta/n
seed: Random seed to set prior to generation; if None, no seed will be set
store_K: If False, runs O(nd) space version which does not store kernel
matrix; if True, stores n x n kernel matrix
"""
For example uses, please refer to the notebook examples/kt/run_kt_experiment.ipynb
.
Examples
Code in the examples
directory uses the goodpoints
package to recreate the experiments of the following research papers.
Kernel Thinning
@article{dwivedi2021kernel,
title={Kernel Thinning},
author={Raaz Dwivedi and Lester Mackey},
journal={arXiv preprint arXiv:2105.05842},
year={2021}
}
- The script
examples/kt/submit_jobs_run_kt.py
reproduces the vignette experiments of Kernel Thinning on a Slurm cluster by executingexamples/kt/run_kt_experiment.ipynb
with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed inexamples/kt/preprocess_mcmc_data.ipynb
. - After all results have been generated, the notebook
plot_results.ipynb
can be used to reproduce the figures of Kernel Thinning.
Generalized Kernel Thinning
@article{dwivedi2021generalized,
title={Generalized Kernel Thinning},
author={Dwivedi, Raaz and Mackey, Lester},
journal={arXiv preprint arXiv:2110.01593},
year={2021}
}
- The script
examples/gkt/ADDME
reproduces the experiments of Generalized Kernel Thinning on a Slurm cluster by executingexamples/gkt/ADDME
with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed inexamples/kt/preprocess_mcmc_data.ipynb
. - After all results have been generated, the notebook
examples/gkt/ADDME
can be used to reproduce the figures of Generalized Kernel Thinning.
Distribution Compression in Near-linear Time
@article{shetti2021distribution,
title={Distribution Compression in Near-linear Time},
author={Abhishek Shetty and Raaz Dwivedi and Lester Mackey},
journal={arXiv preprint to appear},
year={2021}
}
- The script
examples/compress/ADDME
reproduces the experiments of Distribution Compression in Near-linear Time on a Slurm cluster by executingexamples/compress/ADDME
with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed inexamples/kt/preprocess_mcmc_data.ipynb
. - After all results have been generated, the notebook
examples/compress/ADDME
can be used to reproduce the figures of Distribution Compression in Near-linear Time.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for goodpoints-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00978781e838f1317632b710c011c4ee20c2ff0d87f363edd1deb57ce34bc384 |
|
MD5 | 0e5ed823c5b7d932a71c620ac73cb10a |
|
BLAKE2b-256 | 327f3f09b1ecfdd6feb61545f9bd04944731544120696900b49d2da4c8eebd78 |