Parallel random number generation that produces the same result, regardless of the number of threads
Project description
parallel-numpy-rng
A multi-threaded random number generator backed by NumPy RNG, with parallelism provided by Numba.
Overview
This package uses the "fast-forward" capability of the PCG family of RNG, as exposed by the new-style NumPy RNG API, to generate arrays of random numbers in a multi-threaded manner. The result depends only on the random number seed, not the number of threads.
Only a two types of random numbers are supported right now: uniform and normal. More could be added if there is demand, although some kinds, like bounded random ints, are hard to parallelize in the approach used here.
The uniform randoms are the same as NumPy produces for a given seed, although the random normals are not.
Example + Performance
import numpy as np
import parallel_numpy_rng
seed = 123
parallel_rng = parallel_numpy_rng.default_rng(seed)
numpy_rng = np.random.default_rng(seed)
%timeit numpy_rng.random(size=10**9, dtype=np.float32) # 2.85 s
%timeit parallel_rng.random(size=10**9, dtype=np.float32, nthread=1) # 3.34 s
%timeit parallel_rng.random(size=10**9, dtype=np.float32, nthread=128) # 67.8 ms
%timeit numpy_rng.standard_normal(size=10**8, dtype=np.float32) # 1.12 s
%timeit parallel_rng.standard_normal(size=10**8,dtype=np.float32, nthread=1) # 1.85 s
%timeit parallel_rng.standard_normal(size=10**8, dtype=np.float32, nthread=128) # 43.5 ms
Note that the parallel_rng is slower than NumPy when using a single thread, because the parallel implementation requires a slower algorithm in some cases (this is especially noticeable for float64 and normals)
Installation
The code works and is reasonably well tested, so it's probably ready for use. It can be installed from PyPI:
$ pip install parallel-numpy-rng
Details
Random number generation can be slow, even with modern algorithms like PCG, so it's helpful to be able to use multiple threads. The easy way to do this is to give each thread a different seed, but then the RNG sequence will depend on how many threads you used and how you did the seed offset. It would be nice if the RNG sequence could be the output of a single logical sequence (i.e. the stream resulting from a single seed), and the number of threads could just be an implementation detail.
The key capability to enable this is cheap fast-forwarding of the underlying RNG. For example, if we want to generate N random numbers with 2 threads, we know the first thread will do N/2 calls to the RNG, thus advancing its internal state that many times. Therefore, we would like to start the second thread's RNG in a state that is already advanced N/2 times, but without actually making N/2 calls to get there.
This is known as fast-forwarding, or jump-ahead. NumPy's new RNG API
(as of NumPy 1.17) uses the PCG RNG that supports this feature, and NumPy exposes an advance()
method
which implements it. The new API also exposes CFFI bindings to get PCG random values,
so we can implement the core loop, including parallelism, in a Numba-compiled function
that can call the RNG via a low-level function pointer.
An interesting consequence of using fast-forwarding is that each random value must be generated with a known number of calls to the underlying RNG, so that we know how many steps to advance the RNG state by. This means we can't use rejection sampling, which makes a variable number of calls. Fortunately, inverse-transform sampling can usually substitute, or more specific methods like Box-Muller. These can be slower than rejection sampling (or whatever NumPy uses) with one thread, but even just two threads more than makes up for the difference.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parallel_numpy_rng-0.2.0.tar.gz.
File metadata
- Download URL: parallel_numpy_rng-0.2.0.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
859f5ad437e4823c3ea769d8c31aec20c4def89a5a948ed0687821086ec11002
|
|
| MD5 |
84916a5db50a48c7cd33dbbab7e7cf16
|
|
| BLAKE2b-256 |
5bf9a4f071a17d00baa58fd01f939fb794483cef9b423324020773edf5798410
|
Provenance
The following attestation bundles were made for parallel_numpy_rng-0.2.0.tar.gz:
Publisher:
python-publish.yml on lgarrison/parallel-numpy-rng
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parallel_numpy_rng-0.2.0.tar.gz -
Subject digest:
859f5ad437e4823c3ea769d8c31aec20c4def89a5a948ed0687821086ec11002 - Sigstore transparency entry: 676683389
- Sigstore integration time:
-
Permalink:
lgarrison/parallel-numpy-rng@3f7d7dd48004680d7af1ba73a29b4904277199a7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/lgarrison
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3f7d7dd48004680d7af1ba73a29b4904277199a7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file parallel_numpy_rng-0.2.0-py3-none-any.whl.
File metadata
- Download URL: parallel_numpy_rng-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba611aea482c823a128e1e9ce3ad1139ec8cc3f988e12a81fbca24a100e419be
|
|
| MD5 |
02adc308679e6ccd61d78cf38df30b9c
|
|
| BLAKE2b-256 |
b21497f874c4b09bbba5418f358856770a0ad6487a43f0f5481833b37f69e9de
|
Provenance
The following attestation bundles were made for parallel_numpy_rng-0.2.0-py3-none-any.whl:
Publisher:
python-publish.yml on lgarrison/parallel-numpy-rng
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parallel_numpy_rng-0.2.0-py3-none-any.whl -
Subject digest:
ba611aea482c823a128e1e9ce3ad1139ec8cc3f988e12a81fbca24a100e419be - Sigstore transparency entry: 676683402
- Sigstore integration time:
-
Permalink:
lgarrison/parallel-numpy-rng@3f7d7dd48004680d7af1ba73a29b4904277199a7 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/lgarrison
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3f7d7dd48004680d7af1ba73a29b4904277199a7 -
Trigger Event:
release
-
Statement type: