Skip to main content

Parallel random number generation that produces the same result, regardless of the number of threads

Project description

parallel-numpy-rng

tests PyPI

A multi-threaded random number generator backed by NumPy RNG, with parallelism provided by Numba.

Overview

This package uses the "fast-forward" capability of the PCG family of RNG, as exposed by the new-style NumPy RNG API, to generate arrays of random numbers in a multi-threaded manner. The result depends only on the random number seed, not the number of threads.

Only a two types of random numbers are supported right now: uniform and normal. More could be added if there is demand, although some kinds, like bounded random ints, are hard to parallelize in the approach used here.

The uniform randoms are the same as NumPy produces for a given seed, although the random normals are not.

Example + Performance

import numpy as np
import parallel_numpy_rng

seed = 123
parallel_rng = parallel_numpy_rng.default_rng(seed)
numpy_rng = np.random.default_rng(seed)

%timeit numpy_rng.random(size=10**9, dtype=np.float32)                           # 2.85 s
%timeit parallel_rng.random(size=10**9, dtype=np.float32, nthread=1)             # 3.34 s
%timeit parallel_rng.random(size=10**9, dtype=np.float32, nthread=128)           # 67.8 ms

%timeit numpy_rng.standard_normal(size=10**8, dtype=np.float32)                  # 1.12 s
%timeit parallel_rng.standard_normal(size=10**8,dtype=np.float32, nthread=1)     # 1.85 s
%timeit parallel_rng.standard_normal(size=10**8, dtype=np.float32, nthread=128)  # 43.5 ms

Note that the parallel_rng is slower than NumPy when using a single thread, because the parallel implementation requires a slower algorithm in some cases (this is especially noticeable for float64 and normals)

Installation

The code works and is reasonably well tested, so it's probably ready for use. It can be installed from PyPI:

$ pip install parallel-numpy-rng

Details

Random number generation can be slow, even with modern algorithms like PCG, so it's helpful to be able to use multiple threads. The easy way to do this is to give each thread a different seed, but then the RNG sequence will depend on how many threads you used and how you did the seed offset. It would be nice if the RNG sequence could be the output of a single logical sequence (i.e. the stream resulting from a single seed), and the number of threads could just be an implementation detail.

The key capability to enable this is cheap fast-forwarding of the underlying RNG. For example, if we want to generate N random numbers with 2 threads, we know the first thread will do N/2 calls to the RNG, thus advancing its internal state that many times. Therefore, we would like to start the second thread's RNG in a state that is already advanced N/2 times, but without actually making N/2 calls to get there.

This is known as fast-forwarding, or jump-ahead. NumPy's new RNG API (as of NumPy 1.17) uses the PCG RNG that supports this feature, and NumPy exposes an advance() method which implements it. The new API also exposes CFFI bindings to get PCG random values, so we can implement the core loop, including parallelism, in a Numba-compiled function that can call the RNG via a low-level function pointer.

An interesting consequence of using fast-forwarding is that each random value must be generated with a known number of calls to the underlying RNG, so that we know how many steps to advance the RNG state by. This means we can't use rejection sampling, which makes a variable number of calls. Fortunately, inverse-transform sampling can usually substitute, or more specific methods like Box-Muller. These can be slower than rejection sampling (or whatever NumPy uses) with one thread, but even just two threads more than makes up for the difference.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel_numpy_rng-0.2.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parallel_numpy_rng-0.2.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file parallel_numpy_rng-0.2.0.tar.gz.

File metadata

  • Download URL: parallel_numpy_rng-0.2.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for parallel_numpy_rng-0.2.0.tar.gz
Algorithm Hash digest
SHA256 859f5ad437e4823c3ea769d8c31aec20c4def89a5a948ed0687821086ec11002
MD5 84916a5db50a48c7cd33dbbab7e7cf16
BLAKE2b-256 5bf9a4f071a17d00baa58fd01f939fb794483cef9b423324020773edf5798410

See more details on using hashes here.

Provenance

The following attestation bundles were made for parallel_numpy_rng-0.2.0.tar.gz:

Publisher: python-publish.yml on lgarrison/parallel-numpy-rng

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file parallel_numpy_rng-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for parallel_numpy_rng-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba611aea482c823a128e1e9ce3ad1139ec8cc3f988e12a81fbca24a100e419be
MD5 02adc308679e6ccd61d78cf38df30b9c
BLAKE2b-256 b21497f874c4b09bbba5418f358856770a0ad6487a43f0f5481833b37f69e9de

See more details on using hashes here.

Provenance

The following attestation bundles were made for parallel_numpy_rng-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on lgarrison/parallel-numpy-rng

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page