Skip to main content

Optimally compress sampling algorithm outputs

Project description

Stein Thinning

This Python package implements an algorithm for optimally compressing sampling algorithm outputs by minimising a kernel Stein discrepancy. Please see the accompanying paper "Optimal Thinning of MCMC Output" (arXiv) for details of the algorithm.

Installing the package

The latest stable version can be installed via pip:

pip install stein-thinning

To install the current development version, use this command:

pip install git+https://github.com/wilson-ye-chen/stein_thinning

Getting Started

For example, correlated samples from a posterior distribution are obtained using a MCMC algorithm and stored in the NumPy array smpl, and the corresponding gradients of the log-posterior are stored in another NumPy array grad. One can then perform Stein Thinning to obtain a subset of 40 sample points by running the following code:

from stein_thinning.thinning import thin
idx = thin(smpl, grad, 40)

The thin function returns a NumPy array containing the row indices in smpl (and grad) of the selected points. Please refer to demo.py as a starting example.

The default usage requires no additional user input and is based on the identity (id) preconditioning matrix and standardised sample. Alternatively, the user can choose to specify which heuristic to use for computing the preconditioning matrix by setting the option string to either id, med, sclmed, or smpcov. Standardisation can be disabled by setting stnd=False. For example, the default setting corresponds to:

idx = thin(smpl, grad, 40, stnd=True, pre='id')

The details for each of the heuristics are documented in Section 2.3 of the accompanying paper.

PyStan Example

As an illustration of how Stein Thinning can be used to post-process output from Stan, consider the following simple Stan script that produces correlated samples from a bivariate Gaussian model:

from pystan import StanModel
mc = """
parameters {vector[2] x;}
model {x ~ multi_normal([0, 0], [[1, 0.8], [0.8, 1]]);}
"""
sm = stan.build(mc, random_seed=12345)
fit = sm.sample(num_samples=1000)

The bivariate Gaussian model is used for illustration, but regardless of the complexity of the model being sampled the output of Stan will always be a fit object (StanFit instance). The sampled points and the log-posterior gradients can be extracted from the returned fit object:

import numpy as np
sample = fit['x'].T
gradient = np.apply_along_axis(lambda x: sm.grad_log_prob(x.tolist()), 1, sample)
idx = thin(sample, gradient, 40)

The selected points can then be plotted:

plt.figure()
plt.scatter(sample[:, 0], sample[:, 1], color='lightgray')
plt.scatter(sample[idx, 0], sample[idx, 1], color='red')
plt.show()

Stein Thinning Demo Results

The above example can be found in stein_thinning/demo/pystan.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stein_thinning-0.2.0.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

stein_thinning-0.2.0-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file stein_thinning-0.2.0.tar.gz.

File metadata

  • Download URL: stein_thinning-0.2.0.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for stein_thinning-0.2.0.tar.gz
Algorithm Hash digest
SHA256 226880b9e561aef2383030cbba5466276f29b65af4ce0adb277f244d983ae543
MD5 a4e88a4cbda44c34863adcd61ce0cb84
BLAKE2b-256 1a3d8209c53f925a8c765891fa06c318b42f01f1995d1eee3e9f8a8198c7a1ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for stein_thinning-0.2.0.tar.gz:

Publisher: publish_pypi.yml on wilson-ye-chen/stein_thinning

Attestations:

File details

Details for the file stein_thinning-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for stein_thinning-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f72f8a5f6369ec72d1df36a5fe505c1bee9e103c704d05a2d912f12cbb76d5bc
MD5 8a225e516a7e4a410229025a394912c2
BLAKE2b-256 a26c7e6704ac8d2b1c30d3165164276f11003cc7988e2c62dee70c4685a3a5cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for stein_thinning-0.2.0-py3-none-any.whl:

Publisher: publish_pypi.yml on wilson-ye-chen/stein_thinning

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page