Efficiently generate samples from the Polya-Gamma distribution using a NumPy/SciPy compatible interface.
Project description
polya-gamma
Efficiently generate samples from the Polya-Gamma distribution using a NumPy/SciPy compatible interface.
Features
polyagamma
is written in C and optimized for performance.- Very light and easy to install (pre-built wheels).
- It is flexible and allows the user to sample using one of 4 available methods.
- Input parameters can be scalars, arrays or both; allowing for easy generation of multi-dimensional samples without specifying the size.
- Random number generation is thread safe.
- The functional API resembles that of common numpy/scipy functions, therefore making it easy to plugin to existing libraries.
Dependencies
- Numpy >= 1.17
Installation
To get the latest version of the package, one can install it by downloading the wheel/source distribution
from the releases page, or using pip
with the following shell command:
$ pip install -U polyagamma
Alternatively, once can install from source by cloning the repo. This requires an installation of poetry and the following shell commands:
$ git clone https://github.com/zoj613/polya-gamma.git
$ cd polya-gamma/
$ poetry install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH
Example
Python
import numpy as np
from polyagamma import polyagamma
# generate a PG(1, 0) sample
o = polyagamma()
# Get a 5 by 10 array of PG(1, 2) variates.
o = polyagamma(z=2, size=(5, 10))
# Pass sequences as input. Numpy's broadcasting rules apply here.
h = [[1.5, 2, 0.75, 4, 5],
[9.5, 8, 7, 6, 0.9]]
o = polyagamma(h, -2.5)
# Pass an output array
out = np.empty(5)
polyagamma(out=out)
print(out)
# one can choose a sampling method from {devroye, alternate, gamma, saddle}.
# If not given, the default behaviour is a hybrid sampler that picks a method
# based on the parameter values.
o = polyagamma(method="saddle")
# one can also use an existing instance of `numpy.random.Generator` as a parameter.
# This is useful to reproduce samples generated via a given seed.
rng = np.random.default_rng(12345)
o = polyagamma(random_state=rng)
# If one is using a `numpy.random.RandomState` instance instead of the `Generator`
# class, the object's underlying bitgenerator can be passed as the value of random_state
bit_gen = np.random.RandomState(12345)._bit_generator
o = polyagamma(random_state=bit_gen)
# When passing a large input array for the shape parameter `h`, parameter value
# validation checks can be disabled to avoid some overhead, which may boost performance.
large_h = np.ones(1000000)
o = polyagamma(large_h, disable_checks=True)
C
For an example of how to use polyagamma
in a C program, see here.
Benchmarks
Below are runtime plots of 20000 samples generated for various values of h
and z
, using each method. We restrict h
to integer values to accomodate the
devroye
method, which cannot be used for non-integer h
.
Generally:
- The
gamma
method is slowest and should be avoided in cases where speed is paramount. - For
h > 20
, thesaddle
method is the fastest for any value ofz
. - For
z < 2
and integerh < 20
, thedevroye
method is the most efficient. - For
z > 2
and integer/non-integerh < 20
, thealternate
method is the most efficient. - For
h > 50
(or any value large enough), the normal approximation to the distribution is fastest (not reported in the above plot but it is around 10 times faster than thesaddle
method and also equally accurate).
Therefore, we devise a "hybrid/default" sampler that picks a sampler based on the above guidelines.
We also benchmark the hybrid sampler runtime with the sampler found in the pypolyagamma
package (version 1.2.3
). The version of NumPy we use is 1.19.4
. We use the pgdrawv
function which takes arrays as input. Below are runtime plots of 20000 samples for each
value of h
and z
. Values of h
range from 0.1 to 60, while z
is set to 0, 2.5, 5, and 10.
It can be seen that when generating many samples at once for any given combination of
parameters, polyagamma
outperforms the pypolyagamma
package. The exception is when
h < 1
. We rely on the saddle
method to generate samples when the shape parameter is
very small, which is not very efficient for such values. For values of h
larger than 50,
we use the normal approximation, which explains the large dip in runtime past h=50
.
It is also worth noting that the pypolygamma
package is on average faster than ours at
generating exactly one sample from the distribution. This is mainly due to the
overhead introduced by creating the bitgenerator + acquiring/releasing the thread lock +
doing parameter validation checks at every call to the function. This overhead can
somewhat be mitigated by passing in a random generator instance at every call to
the polyagamma
function.
To generate the above plots locally, run python scripts/benchmark.py --size=<some size>> --z=<z value>
.
Note that the runtimes may differ than the ones reported here, depending on the machine this script
is ran on.
Contributing
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
To submit a PR, follow the steps below:
- Fork the repo.
- Setup the dev environment with
poetry install
. All dependencies will be installed. - Start writing your changes, including unittests.
- Once finished, run
make install
to build the project with the new changes. - Once build is successful, run tests to make sure they all pass with
make test
. - Once finished, you can submit a PR for review.
References
- Luc Devroye. "On exact simulation algorithms for some distributions related to Jacobi theta functions." Statistics & Probability Letters, Volume 79, Issue 21, (2009): 2251-2259.
- Polson, Nicholas G., James G. Scott, and Jesse Windle. "Bayesian inference for logistic models using Pólya–Gamma latent variables." Journal of the American statistical Association 108.504 (2013): 1339-1349.
- J. Windle, N. G. Polson, and J. G. Scott. "Improved Polya-gamma sampling". Technical Report, University of Texas at Austin, 2013b.
- Windle, Jesse, Nicholas G. Polson, and James G. Scott. "Sampling Polya-Gamma random variates: alternate and approximate techniques." arXiv preprint arXiv:1405.0506 (2014)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file polyagamma-1.1.0b1.tar.gz
.
File metadata
- Download URL: polyagamma-1.1.0b1.tar.gz
- Upload date:
- Size: 145.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d4e378117170bd36edc2b85cba7f7637fde453da6c40eb20c1e817d3dac361d |
|
MD5 | 9dd11a2e64047c2af65667224b7c9212 |
|
BLAKE2b-256 | 573a31362e476750babe6ab7a5adb46a4dec18bb1e413f96ca35c24876e5c221 |