Skip to main content

Fast on-demand sampling from categorical distributions

Project description

Categorical Sampler

Install from pip: `pip install categorical-sampler`

Let’s generate a probability distribution to get us started. First, sample a bunch of random numbers to determine probability “scores”.

>>> from random import random
>>> k = 10**6
>>> scores = [random() for i in range(k)]
>>> total = sum(scores)
>>> probabilities = [s / total for s in scores]

We've normalized the scores to sum to 1, i.e. make
them into proper probabilities, but actually the categorical sampler will do that for us, so it’s not necessary:

>>> from categorical import Categorical as C
>>> my_sampler = C(scores)
>>> print my_sampler.sample()

Comparing to numpy, assuming we draw 1000 individual samples *individually*:

>>> from numpy.random import choice
>>> import time
>>> def time_numpy():
>>> start = time.time()
>>> for i in range(1000):
>>> choice(k, p=probabilities)
>>> print time.time() - start
>>> def time_my_alias():
>>> start = time.time()
>>> for i in range(1000):
>>> my_sampler.sample()
>>> print time.time() - start
>>> time_numpy()
>>> time_my_alias()

Get the actual probability of a given outcome:

>>> my_sampler.get_probability(487702)

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for categorical, version 0.1.4
Filename, size File type Python version Upload date Hashes
Filename, size categorical-0.1.4.tar.gz (4.4 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page