Skip to main content

Efficiently generate samples from the Polya-Gamma distribution using a NumPy/SciPy compatible interface.

Project description

Polya-Gamma

PyPI - Wheel CI Codecov PyPI - License PyPI Conda

Efficiently generate samples from the Polya-Gamma distribution using a NumPy/SciPy compatible interface.

Why?

If you are reading this, you probably have already used the pypolyagamma package before. It is a great package that I have also used in the past, however I encountered several issues:

  • Generating an array of samples is awkward because it requires using a list comprehension if parameter values are scalars or have pre-allocated arrays of a known size to pass for both the parameters and the output array. Moreover, broadcasting of input is not supported and thus requiring the user to write another layer to support it.
  • It requires extra effort to be used in multiprocessing because pickling of the sampler is not supported.
  • There is no parameter validation supported meaning it is easy to get the wrong samples if you do not check the inputs manually.
  • The sampling API is very different from the ones used by popular packages like numpy/scipy, making it harder to just "plug-n-play" in existing code bases.
  • It does not allow passing in an instance of a np.random.RandomState or np.random.Generator for seeding, requiring extra effort when changing the seed if used in a larger code base.
  • The C++ code wrapped by the package is GPLv3 licensed, making it difficult to use the source code in a project that prefers licenses like MIT/Apache/BSD.

The above issues are the reason why this package exists. And the aim of polyagamma is to "fix" them.

Features

  • Input parameters can be scalars, arrays or both; allowing for easy generation of multi-dimensional samples without specifying the size.
  • Input validation is done internally with clear error messages upon failure.
  • It is flexible and allows the user to sample using one of 4 available algorithms.
  • Implements functions to compute the CDF and density of the distribution as well as their logarithms.
  • Random number generation is thread safe.
  • The functional API resembles that of common numpy/scipy functions, therefore making it easy to plugin to existing libraries.
  • polyagamma is optimized for performance and tests show that it is faster than other implementations.
  • Pre-built wheels are provided for easy installation on Linux, MacOS and Windows.

Examples

Python

import array
import numpy as np
from polyagamma import random_polyagamma

# generate a PG(1, 0) sample
o = random_polyagamma()

# Get a 5 by 1 array of PG(1, 2) variates.
o = random_polyagamma(z=2, size=5)

# We can pass sequences as input. Numpy's broadcasting rules apply here.
# Get a 10 by 2 array where column 1 is PG(2, -10) and column 2 is PG(1, 10)
o = random_polyagamma([2, 1], [-10, 10], size=(10, 2))
z = [[1.5, 2, -0.75, 4, 5],
     [9.5, -8, 7, 6, -0.9]]
o = random_polyagamma(1, z)

# We can pass an output array using the `out` parameter. It does not have to be
# a numpy array. it can be any object that implements the array or buffer protocols.
# As long as its type is 64bit float, contiguous in memory and aligned (e.g. Python's array object).
numpy_out = np.empty(5)
array_out = array.array('d', [0] * 5)
random_polyagamma(out=numpy_out)
print(numpy_out)
random_polyagamma(out=array_out)
print(array_out)

# one can choose a sampling method from {devroye, alternate, gamma, saddle}.
# If not given, the default behaviour is a hybrid sampler that picks the most
# efficient method based on the input values.
o = random_polyagamma(method="saddle")

# one can also use an existing instance of `numpy.random.Generator` as a parameter.
# This is useful to reproduce samples generated via a given seed.
rng = np.random.default_rng(12345)
o = random_polyagamma(random_state=rng)

# If one is using a `numpy.random.RandomState` instance instead of the `Generator`
# class, the object's underlying bitgenerator can be passed as the value of random_state
bit_gen = np.random.RandomState(12345)._bit_generator
o = random_polyagamma(random_state=bit_gen)

# When passing a large input array for the shape parameter `h`, parameter value
# validation checks can be disabled if the values are guaranteed to be positive
# to avoid some overhead, which may boost performance.
large_h = np.ones(1000000)
o = random_polyagamma(large_h, disable_checks=True)

Functions to compute the density and CDF are available. Broadcasting of input is supported.

from polyagamma import polyagamma_pdf, polyagamma_cdf

>>> polyagamma_pdf(0.1)
# 3.613955566329298
>>> polyagamma_cdf([1, 2], h=2, z=1)
# array([0.95637847, 0.99963397])
>>> polyagamma_pdf([2, 0.1], h=[[1, 2], [3, 4]], return_log=True)
# array([[   -8.03172733,  -489.17101125]
#        [   -3.82023942, -1987.09156971]])
>>> polyagamma_cdf(4, z=[-100, 0, 2], return_log=True)
# array([ 3.72007598e-44, -3.40628215e-09, -1.25463528e-12])

Cython

The package also provides low-level functions that can be imported in cython modules. They are:

  • random_polyagamma
  • random_polyagamma_fill
  • random_polyagamma_fill2

Refer to the pgm_random.h header file for more info about the function signatures. Below is an example of how these functions can be used.

from cpython.pycapsule cimport PyCapsule_GetPointer
from polyagamma cimport random_polyagamma_fill, DEVROYE
from numpy.random cimport bitgen_t
import numpy as np

# assuming there exists an instance of the Generator class called `rng`.
bitgenerator = rng._bit_generator
# get pointer to the underlying bitgenerator struct
cdef bitgen_t* bitgen = <bitgen_t*>PyCapsule_GetPointer(bitgenerator.capsule, "BitGenerator")
# set distribution parameters
cdef double h = 1, z = 0
# get a memory view of the array to store samples in
cdef double[:] out = np.empty(300)
with bitgenerator.lock, nogil:
    random_polyagamma_fill(bitgen, h, z, DEVROYE, <size_t>out.shape[0], &out[0])
print(out.base)
...

PyMC

As of pymc>=4.0.0b1, this distribution can be accessed as a PyMC distribution object. See the pymc documentation for more details.

C

For an example of how to use polyagamma in a C program, see here.

Dependencies

  • Numpy >= 1.19.0

Installation

To get the latest version of the package, one can install it by downloading the wheel/source distribution from the releases page, or using pip with the following shell command:

$ pip install --pre -U polyagamma

or using conda with the following command:

$ conda install -c conda-forge polyagamma

Alternatively, once can install from source by cloning the repo. This requires an installation of poetry and the following shell commands:

$ git clone https://github.com/zoj613/polyagamma.git
$ cd polya-gamma/
# install dependencies (make sure pip is up to date before hand)
$ poetry install --no-root
$ make install
# add package to python's path
$ export PYTHONPATH=$PWD:$PYTHONPATH 

Benchmarks

Below are runtime plots of 20000 samples generated for various values of h and z, using each method. We restrict h to integer values to accomodate the devroye method, which cannot be used for non-integer h. The version of the package used to generate them is v1.3.1.

Generally:

  • The gamma method is slowest and should be avoided in cases where speed is paramount.
  • For h >= 8, the saddle method is the fastest for any value of z.
  • For 0 <= z <= 1 and integer h <= 4, the devroye method should be preferred.
  • For z > 1 and 1 < h < 8, the alternate method is the most efficient.
  • For h > 50 (or any value large enough), the normal approximation to the distribution is fastest (not reported in the above plot but it is around 10 times faster than the saddle method and also equally accurate).

Therefore, we devise a "hybrid/default" sampler that picks a sampler based on the above guidelines.

We also benchmark the hybrid sampler runtime with the sampler found in the pypolyagamma package (version 1.2.3). The version of NumPy we use is 1.19.0. We compare our sampler to the pgdrawv functions provided by the package. Below are runtime plots of 20000 samples for each value of h and z. Values of h range from 0.1 to 50, while z is set to 0, 2.5, 5, and 10.

It can be seen that when generating many samples at once for any given combination of parameters, polyagamma outperforms the pypolyagamma package by a large margin. The exception is when the scale parameter is very small (e.g h < 1). It is also worth noting that the pypolygamma package is on average faster than ours at generating exactly 1 sample value from the distribution. This is mainly due to the overhead introduced by creating the bitgenerator + acquiring/releasing the thread lock + doing parameter validation checks at every call to the function. This overhead can somewhat be mitigated by passing in a random generator instance at every call to the polyagamma function. To eliminate this overhead, it is best to use the Cython functions directly. Below is a timing example to demonstrate the benefit of passing a generator explicitly:

In [3]: rng = np.random.SFC64(1)

In [4]: %timeit random_polyagamma()
90 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [5]: %timeit random_polyagamma(random_state=rng)
1.69 µs ± 6.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

To generate the above plots locally, run python scripts/benchmark.py --size=<some size> --z=<z value>. Note that the runtimes may differ than the ones reported here, depending on the machine this script is ran on.

Distribution Plots

Below is a visualization of the Cumulative distribution and density functions for various values of the parameters.

We can compare these plots to the Kernel density estimate and empirical CDF plots generated from 20000 random samples using each of the available methods.

Contributing

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

To submit a PR, follow the steps below:

  1. Fork the repo.
  2. Install the poetry package and setup the dev environment with poetry install --no-root. All dependencies will be installed.
  3. Start writing your changes, including unittests.
  4. Once finished, run make install to build the project with the new changes.
  5. Once build is successful, run tests to make sure they all pass with make test.
  6. Once finished, you can submit a PR for review.

References

  • Luc Devroye. "On exact simulation algorithms for some distributions related to Jacobi theta functions." Statistics & Probability Letters, Volume 79, Issue 21, (2009): 2251-2259.
  • Polson, Nicholas G., James G. Scott, and Jesse Windle. "Bayesian inference for logistic models using Pólya–Gamma latent variables." Journal of the American statistical Association 108.504 (2013): 1339-1349.
  • J. Windle, N. G. Polson, and J. G. Scott. "Improved Polya-gamma sampling". Technical Report, University of Texas at Austin, 2013b.
  • Windle, Jesse, Nicholas G. Polson, and James G. Scott. "Sampling Polya-Gamma random variates: alternate and approximate techniques." arXiv preprint arXiv:1405.0506 (2014)
  • Windle, J. (2013). Forecasting high-dimensional, time-varying variance-covariance matrices with high-frequency data and sampling Pólya-Gamma random variates for posterior distributions derived from logistic likelihoods.(PhD thesis). Retrieved from http://hdl.handle.net/2152/21842 .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polyagamma-1.3.3.tar.gz (168.0 kB view details)

Uploaded Source

Built Distributions

polyagamma-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (502.6 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

polyagamma-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (521.1 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

polyagamma-1.3.3-cp38-cp38-macosx_10_15_x86_64.whl (132.0 kB view details)

Uploaded CPython 3.8 macOS 10.15+ x86-64

polyagamma-1.3.3-cp37-cp37m-win_amd64.whl (124.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

polyagamma-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (492.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

polyagamma-1.3.3-cp37-cp37m-macosx_10_15_x86_64.whl (131.0 kB view details)

Uploaded CPython 3.7m macOS 10.15+ x86-64

File details

Details for the file polyagamma-1.3.3.tar.gz.

File metadata

  • Download URL: polyagamma-1.3.3.tar.gz
  • Upload date:
  • Size: 168.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for polyagamma-1.3.3.tar.gz
Algorithm Hash digest
SHA256 3cc05854c51464e193d092db0235f8340f42a8b3808e8320e2a1910434efe27c
MD5 d697f82c83418664fbd9494b7c5222a2
BLAKE2b-256 8f35e15cc05a86c0301484dfbc642c1ad4d6b1bee3679725940841e338fe6b87

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3b21e13f4b5ad5dbacd3c766ad9b85b038c4c2776c3aaeec393d0b9104b07aa9
MD5 4e2debf2b0172fe270dd4ff320332e51
BLAKE2b-256 ffc7b98cc2d6150791c9c0c806eb8ccd7668240ca23d3a46f4d68f6c8e82f1f6

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8c121d925080bfb95fc1608e175a79832d5958f19ad84e96acaa32969bbf868
MD5 0876dec38025eb7125fcdea4867a2938
BLAKE2b-256 e081824e1e2cdba0a0ed2f2af8d82bfafdcd833b20c9b0fd71b4167ec77c2b5a

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp38-cp38-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp38-cp38-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 e2cae9c8347bc2972975a6ae860d2ca2a4eaf890bd5625a992a459900f81db2f
MD5 f2c0ec3fddc87376005202639bbea9c3
BLAKE2b-256 1fb4c278f1ab2a152a1f21fea9c4ea30e7dd4897722d700dd676b1a1472b7e15

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp37-cp37m-win_amd64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 6bcd6759c3274a131f1c245894e6ef27e60f3b6bfc23e9064c66cd7c6ba0ebc7
MD5 a05f75444a69c5f7104d0e93149e3e55
BLAKE2b-256 92d2c8531b2ca7353f93d85944e4eb7e649ecb9e67097ab8f542dc0404e52269

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 be6521df61d043eafbe90722c310e59b6cc1e094b2dc7f1d930e754cc7453633
MD5 8e68273df42c31f345a0d83171a4d82e
BLAKE2b-256 be1ba7e33852a00e25453068cbadb884896ddcbfe412e70f7e26f19825108dad

See more details on using hashes here.

File details

Details for the file polyagamma-1.3.3-cp37-cp37m-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polyagamma-1.3.3-cp37-cp37m-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 29cbfb7bad8b3f5cdf4ab35bd11a7406f2f5fb11333e7d1bbe80f30fde73f6c0
MD5 8167b501ef51dfdb4f2b8c271f50f6d8
BLAKE2b-256 01a301b3f69aef18775be8e6cb1aaa437269d0d6c4e1303b7c5ff34ad253341a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page