Skip to main content

Fast on-demand sampling from categorical distributions

Project description

Categorical Sampler
-----

Install from pip: `pip install categorical-sampler`

Let’s generate a probability distribution to get us started. First, sample a bunch of random numbers to determine probability “scores”.


>>> from random import random
>>> k = 10**6
>>> scores = [random() for i in range(k)]
>>> total = sum(scores)
>>> probabilities = [s / total for s in scores]


We've normalized the scores to sum to 1, i.e. make
them into proper probabilities, but actually the categorical sampler will do that for us, so it’s not necessary:

>>> from categorical import Categorical as C
>>> my_sampler = C(scores)
>>> print my_sampler.sample()
487702

Comparing to numpy, assuming we draw 1000 individual samples *individually*:


>>> from numpy.random import choice
>>> import time
>>>
>>> def time_numpy():
>>> start = time.time()
>>> for i in range(1000):
>>> choice(k, p=probabilities)
>>> print time.time() - start
>>>
>>> def time_my_alias():
>>> start = time.time()
>>> for i in range(1000):
>>> my_sampler.sample()
>>> print time.time() - start
>>>
>>> time_numpy()
31.0555009842
>>> time_my_alias()
0.0127031803131

Get the actual probability of a given outcome:

>>> my_sampler.get_probability(487702)
1.0911282101090306e-06

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorical-0.1.3.tar.gz (3.5 kB view details)

Uploaded Source

File details

Details for the file categorical-0.1.3.tar.gz.

File metadata

  • Download URL: categorical-0.1.3.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for categorical-0.1.3.tar.gz
Algorithm Hash digest
SHA256 b2bdd287795b5c9df8850940193f4dc06c7f1e19679ee63289a6d187b9ee4162
MD5 28ffce55710a63fc1f94d3e04245a935
BLAKE2b-256 ece6e8310999b1dc255e13fb4506a9b11a34a3d7e769410c6a68d2ab4fe88fb9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page