Skip to main content

Optimal Approximate Sampling from Discrete Probability Distributions

Project description

Optimal Approximate Sampling From Discrete Probability Distributions

This repository contains a prototype implementation of the optimal sampling algorithms from:

Feras A. Saad, Cameron E. Freer, Martin C. Rinard, and Vikash K. Mansinghka. Optimal Approximate Sampling From Discrete Probability Distributions. Proc. ACM Program. Lang. 4, POPL, Article 36 (January 2020), 33 pages.

Installing

The Python 3 library can be installed via pip:

pip install optas

The C code for the main sampler is in the c/ directory and the Python 3 libraries are in the src/ directory.

Only Python 3 is required to build and use the software from source.

$ git clone https://github.com/probcomp/optimal-approximate-sampling
$ cd optimal-approximate-sampling
$ python setup.py install

To build the C sampler

$ cd c && make all

Usage

Please refer to the examples in the examples directory. Given a fixed target distribution and error measure:

  1. ./examples/sampling.py shows an example of how to find an optimal distribution and sample from it, given a user-specified precision.

  2. ./examples/maxerror.py shows an example of how to find an optimal distribution that uses the least possible precision and obtains an error that is less than a user-specified maximum allowable error.

These examples can be run directly as follows:

$ ./pythenv.sh python examples/sampling.py
$ ./pythenv.sh python examples/maxerror.py

Tests

To test the Python library and run a crash test in C (requires pytest and scipy):

$ ./check.sh

Experiments

The code for experiments in the POPL publication is available in a tarball on the ACM Digital Library. Please refer to the online supplementary material at https://doi.org/10.1145/3371104.

Citing

Please use the following BibTeX to cite this work.

@article{saad2020sampling,
title          = {Optimal approximate sampling from discrete probability distributions},
author         = {Saad, Feras A. and Freer, Cameron E. and Rinard, Martin C. and Mansinghka, Vikash K.},
journal        = {Proc. ACM Program. Lang.},
volume         = 4,
number         = {POPL},
month          = jan,
year           = 2020,
pages          = {36:1--36:31},
numpages       = 31,
publisher      = {ACM},
doi            = {10.1145/3371104},
abstract       = {This paper addresses a fundamental problem in random variate generation: given access to a random source that emits a stream of independent fair bits, what is the most accurate and entropy-efficient algorithm for sampling from a discrete probability distribution $(p_1, \dots, p_n)$, where the output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ of the sampling algorithm can be specified with a given level of bit precision? We present a theoretical framework for formulating this problem and provide new techniques for finding sampling algorithms that are optimal both statistically (in the sense of sampling accuracy) and information-theoretically (in the sense of entropy consumption). We leverage these results to build a system that, for a broad family of measures of statistical accuracy, delivers a sampling algorithm whose expected entropy usage is minimal among those that induce the same distribution (i.e., is ``entropy-optimal'') and whose output distribution $(\hat{p}_1, \dots, \hat{p}_n)$ is a closest approximation to the target distribution $(p_1, \dots, p_n)$ among all entropy-optimal sampling algorithms that operate within the specified precision budget. This optimal approximate sampler is also a closer approximation than any (possibly entropy-suboptimal) sampler that consumes a bounded amount of entropy with the specified precision, a class which includes floating-point implementations of inversion sampling and related methods found in many standard software libraries. We evaluate the accuracy, entropy consumption, precision requirements, and wall-clock runtime of our optimal approximate sampling algorithms on a broad set of probability distributions, demonstrating the ways that they are superior to existing approximate samplers and establishing that they often consume significantly fewer resources than are needed by exact samplers.},
}

Related Repositories

For a near-optimal exact dice rolling algorithm see https://github.com/probcomp/fast-loaded-dice-roller.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optas-1.0.3.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

optas-1.0.3-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file optas-1.0.3.tar.gz.

File metadata

  • Download URL: optas-1.0.3.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.9

File hashes

Hashes for optas-1.0.3.tar.gz
Algorithm Hash digest
SHA256 f7669a5bd671bd1cbf75f621c3b212fa57c300fd94b81b8984d0cd4d6ed79337
MD5 d3ee94398986e870208f212012e48b61
BLAKE2b-256 cd450ffde724688ac6190fa11f57dd5d95dc38362d49b4901cec36108b1d13ab

See more details on using hashes here.

File details

Details for the file optas-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: optas-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.9

File hashes

Hashes for optas-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2c2d3a41be4062bd1f4ea5b42da1ffa137176e04ac2ae050191295b5dfd73157
MD5 6c5cd805821102fa172deb5ed5f07055
BLAKE2b-256 8ea87828bfe4f81ee660bd949ba248e0296edd21fa13cb32efa618b66f738975

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page