Reproduce R's native default RNG (set.seed/runif/sample) bit-for-bit in pure Python, no R at runtime
Project description
rrng — reproduce R's native RNG bit-for-bit in pure Python
Run unmodified, already-published R randomness (set.seed() + sample() + runif() …)
inside Python and get the identical numbers — with no R installed at runtime.
from rrng import RRNG
g = RRNG(100) # R: set.seed(100) (sample_kind="rounding" for R < 3.6)
g.unif_rand() # one R runif() draw -> 0.3077661
g.runif(5) # an R runif(5) block (numpy array)
g.rnorm(5) # rnorm(5) (Inversion, R's default)
g.rexp(5, rate=2); g.rpois(5, mu=3) # rexp / rpois
g.rbinom(5, size=20, prob=0.3) # rbinom
g.rgamma(5, shape=2.5, scale=2) # rgamma
g.sample(10, 5, replace=False) # sample(1:10, 5) -> 0-based indices
g.sample(10, 8, replace=True, prob=w) # weighted sample(prob=)
idx = RRNG(100).sample_index(10, 10) # 0-based sample(1:10, 10, replace=TRUE) -> [9 6 5 2 8 9 6 5 5 3]
A RRNG is stateful, so a single seed stream can be threaded across calls/regions —
essential for reproducing R scripts that set.seed(s) once and then map()/loop, where the
order of consumption must match.
👉 See USAGE.md for a full guide (the R→Python cheat-sheet, threaded streams, 0- vs 1-based indexing, common pitfalls).
Install
Zero hard dependencies beyond NumPy.
pip install rrng # from PyPI (if published)
# or, from a checkout:
pip install -e . # editable install of this package
It is also importable in place: if the rrng/ package directory is on your PYTHONPATH
(e.g. it sits next to your script), from rrng import RRNG just works without installing.
Quick start
from rrng import RRNG
# --- R --- # --- Python (rrng) ---
# set.seed(42) g = RRNG(42)
# runif(3) g.runif(3)
# sample(1:n, k, replace = TRUE) g.sample_index(n, k) + 1 # +1 for R's 1-based indices
To resample a data array the way R's bootstrap does:
import numpy as np
data = np.array([...])
g = RRNG(123)
resample = data[g.sample_index(data.size, data.size)] # sample(data, replace=TRUE)
Validation — diff against real R
The library is only worth anything if it is provably identical to R, so the tests diff against golden vectors generated by real R and committed as a fixture:
# (re)generate fixtures — requires R (any version >= 3.6)
Rscript rrng/tests/generate_golden.R
# run the diff — requires Python + numpy
python rrng/tests/test_rrng.py # standalone runner
pytest rrng/tests # or under pytest
rrng/tests/fixtures/golden_vectors.json holds runif and sample (both sample.kinds)
across several seeds and n/size, including n large enough to exercise the multi-draw
rejection path. The end-to-end example in rrng/examples/ reproduces a published PNAS
snow-drought attribution bootstrap (risk ratios, return intervals, confidence bounds) to every
digit, from Python, with no R.
Why this exists (the gap)
Reproducing a published R analysis in Python is common (porting pipelines, checking results,
building Python tooling around an R method). The randomness is almost always the blocker: the
analysis used R's default generator via set.seed(s); sample(...), and naive Python ports
silently diverge.
| approach | matches R's default set.seed+sample? |
pure Python (no R)? | reproduces an existing published R script unchanged? |
|---|---|---|---|
rpy2 (embed/call R) |
yes (it is R) | ❌ needs R installed | yes |
SyncRNG (shared Tausworthe RNG in C) |
❌ — a different RNG; you must rewrite the R code | ✅ | ❌ |
numpy MT19937 / random |
❌ — same MT core but different seeding + different sample() |
✅ | ❌ |
rrng |
✅ | ✅ | ✅ |
The widely-repeated claim "you can't make Python's Mersenne-Twister match R's set.seed" is true
for the naive approach but not fundamentally true. R's chain is fully specified; three pieces are
usually gotten wrong:
- R's seeding scramble —
set.seed(s)is not the standard MTinit_genrand. R scrambles the seed 50× bys = 69069*s + 1, then fills its 625-word state with the same LCG, and setsmti = 624. - R's
unif_randscaling —fixup(MT_genrand() * 1/(2³²−1)), with R's specific edgefixup. - R's
sample()index method — since R 3.6.0 the default is Rejection (R_unif_indexviarbits(ceil(log2 n))with rejection of draws≥ n), not the oldfloor(n*unif_rand())Rounding method. Most ports use Rounding and mismatch onsample().
Get all three right and Python matches R exactly.
Scope
Covered (validated bit-for-bit against R 4.5):
- RNG kind: Mersenne-Twister (R's default), exact
set_seed(s). - Uniform:
unif_rand(),runif(n). - Normal:
rnorm(n, mean, sd)— R's default Inversion (qnorm, Wichura AS 241), 2 draws/value. - Exponential:
rexp(n, rate)— Ahrens-Dieterexp_rand. - Poisson:
rpois(n, mu)— Ahrens-Dieter (both the small-muinversion and big-murejection branches). - Binomial:
rbinom(n, size, prob)— inversion (np<30) and BTPE (np≥30). - Gamma:
rgamma(n, shape, rate=, scale=)— GD (shape≥1) and GS (shape<1). - Sampling:
sample(n, size, replace=, prob=)andsample_index(n, size)— equal-probability with/without replacement, and weightedprob=(cumulative, Walker alias for >200 heavy cells, and sequential no-replacement). Both the R ≥ 3.6 Rejection index method (sample_kind="rejection", default) and the R < 3.6 Rounding method (sample_kind="rounding"). Returns 0-based indices.
Not (yet) covered:
- Other RNG kinds (Wichmann-Hill, Marsaglia-Multicarry, Super-Duper, Knuth-TAOCP, L'Ecuyer-CMRG)
and non-default
normal.kinds (Box-Muller, Kinderman-Ramage). rbeta,rt,rchisq,rf,rcauchy,rlogis,rweibull,rgeom,rnbinom,rhyper, …sample.int(useHash=TRUE)(only triggers forn > 1e7unweighted no-replacement draws).
Honesty about scope is the point: advertise exactly what matches R, not "all of R".
Design
- Performance: drive the MT core with NumPy's
MT19937seeded to R's exact 624-word state (NumPy's MT == R's MT, so an identical uint32 stream), then apply R'sunif_randscaling and the rejectionsampleon top, vectorized. Bit-identical to the slow pure-Python reference but ~100× faster (a 100-member bootstrap resample runs in <1 s). - A slow, transparent pure-Python
_genrandreference is kept in the source for auditability.
Roadmap
- MT +
set_seed+runif+sample_index(replace=TRUE), both sample kinds. ✅ validated. rnormvia Inversion (qnorm). ✅ validated.samplewithout replacement; weightedsample(prob=)(incl. Walker alias). ✅ validated.rexp,rbinom,rpois,rgamma. ✅ validated.- Next: more continuous families (
rbeta,rchisq,rt,rf,rweibull, …) and discrete (rgeom,rnbinom,rhyper); optional alternate RNG kinds /normal.kinds behind flags.
Positioning
- Need R installed and want everything? →
rpy2. - Control both sides and just want a shared stream? →
SyncRNG(different RNG; rewrite both). - Need to reproduce an existing, unmodified, default-RNG R analysis in pure Python? →
rrng.
Provenance & license
MIT licensed (see LICENSE). The implementation follows R's source
(src/main/RNG.c) and the "Random"
R help page.
Origin: extracted from a snow-drought attribution project, where it was built to reproduce a WUS-D3 / PNAS bootstrap bit-for-bit in Python without R.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rrng-0.1.0.tar.gz.
File metadata
- Download URL: rrng-0.1.0.tar.gz
- Upload date:
- Size: 623.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e7afc233fd614b449edee0218fea0b3a78bdcfb333eae1c93d1c60bf5e71043
|
|
| MD5 |
816ae98c1800bfdfa8b42a7ef0b8c37c
|
|
| BLAKE2b-256 |
de1a1c5a3b49b03ce95485988c1646a144976baafe03a6b7c07f9a19bd439eec
|
Provenance
The following attestation bundles were made for rrng-0.1.0.tar.gz:
Publisher:
publish.yml on fzhao70/rrng
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rrng-0.1.0.tar.gz -
Subject digest:
0e7afc233fd614b449edee0218fea0b3a78bdcfb333eae1c93d1c60bf5e71043 - Sigstore transparency entry: 1888090122
- Sigstore integration time:
-
Permalink:
fzhao70/rrng@7a43197ce08bacab5f4e263b33d66a5dea7b5ce9 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/fzhao70
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a43197ce08bacab5f4e263b33d66a5dea7b5ce9 -
Trigger Event:
release
-
Statement type:
File details
Details for the file rrng-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rrng-0.1.0-py3-none-any.whl
- Upload date:
- Size: 625.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cc45a56d1980a37f209a1d03a21f806812bfd4501ab398ebfd67b82249b2dfd
|
|
| MD5 |
32ea64e7ab04b2b611f44d83503a904b
|
|
| BLAKE2b-256 |
6a7b6ba5504c64c164839420c28b699093d3ad08693e3f444f17506835864587
|
Provenance
The following attestation bundles were made for rrng-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on fzhao70/rrng
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rrng-0.1.0-py3-none-any.whl -
Subject digest:
3cc45a56d1980a37f209a1d03a21f806812bfd4501ab398ebfd67b82249b2dfd - Sigstore transparency entry: 1888090262
- Sigstore integration time:
-
Permalink:
fzhao70/rrng@7a43197ce08bacab5f4e263b33d66a5dea7b5ce9 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/fzhao70
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7a43197ce08bacab5f4e263b33d66a5dea7b5ce9 -
Trigger Event:
release
-
Statement type: