Project description

random-gen

Python application

A random number generator that will randomly generate numbers from a given input list and probability distribution. E.g.

>> from random_gen import RandomGen
>> 
>> r = RandomGen((1,2,3), (0.5,0.1,0.4)) # input number list, probability distribution
>> generated_numbers = [r.next_num() for _ in range(0, 11)] 
>>
>> import collections
>> collections.Counter(generated_numbers) # See distribution of numbers generated
Counter({1: 7, 2: 1, 3: 3})

As you can see, the distribution of numbers is roughly in line with the probability distribution passed to RandomGen. This implementation is taken from cpython random.choices() - https://github.com/python/cpython/blob/master/Lib/random.py#L397 and optimised for known cumulative weights and k.

This works by using the cumulative probabilities of the random number list. Randomly generated probabilities (using random.random) are multiplied by the total sum of the probabilities. This probability is then inserted into the list of cumulative probabilities using bisect.bisect and that insert position used to return the relative random number (from the input random_nums). As the probability has been proportioned along the probability distribution (because it's multiplied by the total sum of the probabilities) it is more likely to return numbers that have a larger probability.

Prerequisites

Python 3.6

Running

>> from random_gen import RandomGen
>> random_nums = (1,2,3)
>> probabilities = (0.5, 0.1, 0.4)
>> r = RandomGen(random_nums, probabilities)
>> r.next_num()
1

Tests

To run the tests: pytest tests/test_random_gen.py

How to make RandomGen more 'pythonic'

We could subclass the standard Python Generator class and use the dunder methods to provide next_num through __next__ or simply provide a generator function called random_gen which yields each result when called.
I would also advise using random.choices directly rather than wrapped in a class if the use cases are less than ~10000 inputs (otherwise performance suffers). This would be simpler and more readable.

Performance

# input numbers provided	# output numbers generated	speed (secs)	function calls	memory consumption for next_num (mb)
100	100	0.005	304	11
1000	1000	0.002	3004	11
10000	10000	0.018	30004	12.4
100000	100000	0.172	300004	33
100000	1000000	1.95	3000004	199

Further Optimisations

Memory consumption needs investigation. Is there an alternative efficient method for bisecting the cumulative probability array without using a list?
Arrays/efficient storage for large sets of numbers if needed.
Investigate performance of bisect.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.0

Apr 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

random-gen-1.0.0.tar.gz (3.2 kB view hashes)

Uploaded Apr 16, 2020 Source

Built Distribution

random_gen-1.0.0-py3-none-any.whl (4.1 kB view hashes)

Uploaded Apr 16, 2020 Python 3

Hashes for random-gen-1.0.0.tar.gz

Hashes for random-gen-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`add65bf702881bf9e9793dae11b7bf47305ee15b6e058087a69b1559aa826aaa`
MD5	`5f917ffb018760002934d8dac6979731`
BLAKE2b-256	`3a316404cd183f095791a5971c4d8905c39bcf953a78ca431a3cb979619b255e`

Hashes for random_gen-1.0.0-py3-none-any.whl

Hashes for random_gen-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6aa60c9558312137f59fdf4d56fb3794a2bdcabb71cdfb5eca618e8b9252c989`
MD5	`9b5c5ba3091768a2c9d4175fcaf099bd`
BLAKE2b-256	`e123341787e7619d598617ce0688dc793a4cb9421d7a22fd64205631daf3774d`