Fast on-demand sampling from categorical distributions
Project description
Categorical Sampler
-----
Install from pip: `pip install categorical-sampler`
Let’s generate a probability distribution to get us started. First, sample a bunch of random numbers to determine probability “scores”.
>>> from random import random
>>> k = 10**6
>>> scores = [random() for i in range(k)]
>>> total = sum(scores)
>>> probabilities = [s / total for s in scores]
We've normalized the scores to sum to 1, i.e. make
them into proper probabilities, but actually the categorical sampler will do that for us, so it’s not necessary:
>>> from categorical import Categorical as C
>>> my_sampler = C(scores)
>>> print my_sampler.sample()
487702
Comparing to numpy, assuming we draw 1000 individual samples *individually*:
>>> from numpy.random import choice
>>> import time
>>>
>>> def time_numpy():
>>> start = time.time()
>>> for i in range(1000):
>>> choice(k, p=probabilities)
>>> print time.time() - start
>>>
>>> def time_my_alias():
>>> start = time.time()
>>> for i in range(1000):
>>> my_sampler.sample()
>>> print time.time() - start
>>>
>>> time_numpy()
31.0555009842
>>> time_my_alias()
0.0127031803131
Get the actual probability of a given outcome:
>>> my_sampler.get_probability(487702)
1.0911282101090306e-06
-----
Install from pip: `pip install categorical-sampler`
Let’s generate a probability distribution to get us started. First, sample a bunch of random numbers to determine probability “scores”.
>>> from random import random
>>> k = 10**6
>>> scores = [random() for i in range(k)]
>>> total = sum(scores)
>>> probabilities = [s / total for s in scores]
We've normalized the scores to sum to 1, i.e. make
them into proper probabilities, but actually the categorical sampler will do that for us, so it’s not necessary:
>>> from categorical import Categorical as C
>>> my_sampler = C(scores)
>>> print my_sampler.sample()
487702
Comparing to numpy, assuming we draw 1000 individual samples *individually*:
>>> from numpy.random import choice
>>> import time
>>>
>>> def time_numpy():
>>> start = time.time()
>>> for i in range(1000):
>>> choice(k, p=probabilities)
>>> print time.time() - start
>>>
>>> def time_my_alias():
>>> start = time.time()
>>> for i in range(1000):
>>> my_sampler.sample()
>>> print time.time() - start
>>>
>>> time_numpy()
31.0555009842
>>> time_my_alias()
0.0127031803131
Get the actual probability of a given outcome:
>>> my_sampler.get_probability(487702)
1.0911282101090306e-06
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
categorical-0.1.0.tar.gz
(3.3 kB
view details)
File details
Details for the file categorical-0.1.0.tar.gz
.
File metadata
- Download URL: categorical-0.1.0.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29c98b74012f0a7644a65428bd349449515daf6ca99a12a007bcea1fb1a6e5b7 |
|
MD5 | 28319e571ee13b7ca818c19c17877e7d |
|
BLAKE2b-256 | c362fdeaa044a4add36fbbff52c5f0375b7766f2a2e47f615bc21eb989943916 |