Skip to main content

Reproduce R's native default RNG (set.seed/runif/sample) bit-for-bit in pure Python, no R at runtime

Project description

rrng — reproduce R's native RNG bit-for-bit in pure Python

Run unmodified, already-published R randomness (set.seed() + sample() + runif() …) inside Python and get the identical numbers — with no R installed at runtime.

from rrng import RRNG

g = RRNG(100)                          # R: set.seed(100)   (sample_kind="rounding" for R < 3.6)
g.unif_rand()                          # one R runif() draw                  -> 0.3077661
g.runif(5)                             # an R runif(5) block (numpy array)
g.rnorm(5)                             # rnorm(5)   (Inversion, R's default)
g.rexp(5, rate=2); g.rpois(5, mu=3)    # rexp / rpois
g.rbinom(5, size=20, prob=0.3)         # rbinom
g.rgamma(5, shape=2.5, scale=2)        # rgamma
g.sample(10, 5, replace=False)         # sample(1:10, 5)            -> 0-based indices
g.sample(10, 8, replace=True, prob=w)  # weighted sample(prob=)
idx = RRNG(100).sample_index(10, 10)   # 0-based sample(1:10, 10, replace=TRUE) -> [9 6 5 2 8 9 6 5 5 3]

A RRNG is stateful, so a single seed stream can be threaded across calls/regions — essential for reproducing R scripts that set.seed(s) once and then map()/loop, where the order of consumption must match.

👉 See USAGE.md for a full guide (the R→Python cheat-sheet, threaded streams, 0- vs 1-based indexing, common pitfalls).

Install

Zero hard dependencies beyond NumPy.

pip install rrng                  # from PyPI (if published)
# or, from a checkout:
pip install -e .                  # editable install of this package

It is also importable in place: if the rrng/ package directory is on your PYTHONPATH (e.g. it sits next to your script), from rrng import RRNG just works without installing.

Quick start

from rrng import RRNG

# --- R ---                          # --- Python (rrng) ---
# set.seed(42)                       g = RRNG(42)
# runif(3)                           g.runif(3)
# sample(1:n, k, replace = TRUE)     g.sample_index(n, k) + 1   # +1 for R's 1-based indices

To resample a data array the way R's bootstrap does:

import numpy as np
data = np.array([...])
g = RRNG(123)
resample = data[g.sample_index(data.size, data.size)]   # sample(data, replace=TRUE)

Validation — diff against real R

The library is only worth anything if it is provably identical to R, so the tests diff against golden vectors generated by real R and committed as a fixture:

# (re)generate fixtures — requires R (any version >= 3.6)
Rscript rrng/tests/generate_golden.R

# run the diff — requires Python + numpy
python rrng/tests/test_rrng.py        # standalone runner
pytest rrng/tests                     # or under pytest

rrng/tests/fixtures/golden_vectors.json holds runif and sample (both sample.kinds) across several seeds and n/size, including n large enough to exercise the multi-draw rejection path. The end-to-end example in rrng/examples/ reproduces a published PNAS snow-drought attribution bootstrap (risk ratios, return intervals, confidence bounds) to every digit, from Python, with no R.


Why this exists (the gap)

Reproducing a published R analysis in Python is common (porting pipelines, checking results, building Python tooling around an R method). The randomness is almost always the blocker: the analysis used R's default generator via set.seed(s); sample(...), and naive Python ports silently diverge.

approach matches R's default set.seed+sample? pure Python (no R)? reproduces an existing published R script unchanged?
rpy2 (embed/call R) yes (it is R) ❌ needs R installed yes
SyncRNG (shared Tausworthe RNG in C) ❌ — a different RNG; you must rewrite the R code
numpy MT19937 / random ❌ — same MT core but different seeding + different sample()
rrng

The widely-repeated claim "you can't make Python's Mersenne-Twister match R's set.seed" is true for the naive approach but not fundamentally true. R's chain is fully specified; three pieces are usually gotten wrong:

  1. R's seeding scrambleset.seed(s) is not the standard MT init_genrand. R scrambles the seed 50× by s = 69069*s + 1, then fills its 625-word state with the same LCG, and sets mti = 624.
  2. R's unif_rand scalingfixup(MT_genrand() * 1/(2³²−1)), with R's specific edge fixup.
  3. R's sample() index method — since R 3.6.0 the default is Rejection (R_unif_index via rbits(ceil(log2 n)) with rejection of draws ≥ n), not the old floor(n*unif_rand()) Rounding method. Most ports use Rounding and mismatch on sample().

Get all three right and Python matches R exactly.

Scope

Covered (validated bit-for-bit against R 4.5):

  • RNG kind: Mersenne-Twister (R's default), exact set_seed(s).
  • Uniform: unif_rand(), runif(n).
  • Normal: rnorm(n, mean, sd) — R's default Inversion (qnorm, Wichura AS 241), 2 draws/value.
  • Exponential: rexp(n, rate) — Ahrens-Dieter exp_rand.
  • Poisson: rpois(n, mu) — Ahrens-Dieter (both the small-mu inversion and big-mu rejection branches).
  • Binomial: rbinom(n, size, prob) — inversion (np<30) and BTPE (np≥30).
  • Gamma: rgamma(n, shape, rate=, scale=)GD (shape≥1) and GS (shape<1).
  • Sampling: sample(n, size, replace=, prob=) and sample_index(n, size) — equal-probability with/without replacement, and weighted prob= (cumulative, Walker alias for >200 heavy cells, and sequential no-replacement). Both the R ≥ 3.6 Rejection index method (sample_kind="rejection", default) and the R < 3.6 Rounding method (sample_kind="rounding"). Returns 0-based indices.

Not (yet) covered:

  • Other RNG kinds (Wichmann-Hill, Marsaglia-Multicarry, Super-Duper, Knuth-TAOCP, L'Ecuyer-CMRG) and non-default normal.kinds (Box-Muller, Kinderman-Ramage).
  • rbeta, rt, rchisq, rf, rcauchy, rlogis, rweibull, rgeom, rnbinom, rhyper, …
  • sample.int(useHash=TRUE) (only triggers for n > 1e7 unweighted no-replacement draws).

Honesty about scope is the point: advertise exactly what matches R, not "all of R".

Design

  • Performance: drive the MT core with NumPy's MT19937 seeded to R's exact 624-word state (NumPy's MT == R's MT, so an identical uint32 stream), then apply R's unif_rand scaling and the rejection sample on top, vectorized. Bit-identical to the slow pure-Python reference but ~100× faster (a 100-member bootstrap resample runs in <1 s).
  • A slow, transparent pure-Python _genrand reference is kept in the source for auditability.

Roadmap

  1. MT + set_seed + runif + sample_index (replace=TRUE), both sample kinds. ✅ validated.
  2. rnorm via Inversion (qnorm). ✅ validated.
  3. sample without replacement; weighted sample(prob=) (incl. Walker alias). ✅ validated.
  4. rexp, rbinom, rpois, rgamma. ✅ validated.
  5. Next: more continuous families (rbeta, rchisq, rt, rf, rweibull, …) and discrete (rgeom, rnbinom, rhyper); optional alternate RNG kinds / normal.kinds behind flags.

Positioning

  • Need R installed and want everything? → rpy2.
  • Control both sides and just want a shared stream? → SyncRNG (different RNG; rewrite both).
  • Need to reproduce an existing, unmodified, default-RNG R analysis in pure Python? → rrng.

Provenance & license

MIT licensed (see LICENSE). The implementation follows R's source (src/main/RNG.c) and the "Random" R help page.

Origin: extracted from a snow-drought attribution project, where it was built to reproduce a WUS-D3 / PNAS bootstrap bit-for-bit in Python without R.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rrng-0.1.0.tar.gz (623.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rrng-0.1.0-py3-none-any.whl (625.7 kB view details)

Uploaded Python 3

File details

Details for the file rrng-0.1.0.tar.gz.

File metadata

  • Download URL: rrng-0.1.0.tar.gz
  • Upload date:
  • Size: 623.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rrng-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0e7afc233fd614b449edee0218fea0b3a78bdcfb333eae1c93d1c60bf5e71043
MD5 816ae98c1800bfdfa8b42a7ef0b8c37c
BLAKE2b-256 de1a1c5a3b49b03ce95485988c1646a144976baafe03a6b7c07f9a19bd439eec

See more details on using hashes here.

Provenance

The following attestation bundles were made for rrng-0.1.0.tar.gz:

Publisher: publish.yml on fzhao70/rrng

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rrng-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rrng-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 625.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rrng-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3cc45a56d1980a37f209a1d03a21f806812bfd4501ab398ebfd67b82249b2dfd
MD5 32ea64e7ab04b2b611f44d83503a904b
BLAKE2b-256 6a7b6ba5504c64c164839420c28b699093d3ad08693e3f444f17506835864587

See more details on using hashes here.

Provenance

The following attestation bundles were made for rrng-0.1.0-py3-none-any.whl:

Publisher: publish.yml on fzhao70/rrng

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page