Skip to main content

UniqueRandomizer: Incremental Sampling Without Replacement

Project description

UniqueRandomizer

Overview

UniqueRandomizer is a data structure for sampling outputs of a randomized program, such as a neural sequence model, incrementally and without replacement.

  • Incremental sampling: Instead of sampling a large batch of outputs all at once, as with beam search, UniqueRandomizer provides samples one at a time. This enables flexibility in stopping criteria, such as stopping the sampling process as soon as a satisfactory output is found.
  • Sampling without replacement: In many applications, a neural model is used to produce candidate solutions to some search or optimization problem. In such applications it is usually desirable to consider unique candidate solutions, since duplicates are typically not useful.

For more details, refer to our paper, Incremental Sampling Without Replacement for Sequence Models, published at ICML 2020.

BibTeX entry:

@article{shi2020uniquerandomizer,
  title = {Incremental Sampling Without Replacement for Sequence Models},
  author = {Kensen Shi and David Bieber and Charles Sutton},
  booktitle = {Proceedings of the 37th International Conference on Machine Learning},
  year = {2020}
}

Installation

python3 -m pip install --user unique-randomizer

This package requires Python 3. The above command automatically installs the following dependencies as well:

  • absl-py >= 0.6.1
  • numpy >= 1.15.4
  • scipy >= 1.1.0

Usage

To use UniqueRandomizer, first identify the program or function that you wish to draw unique samples from, such as the draw_sample function in the following example:

def draw_sample(sequence_model, state):
  """Draws a sample (a sequence of token indices) from the sequence model."""
  tokens = []
  token = BOS
  for i in range(MAX_LEN):
    probs, state = sequence_model(token, state)
    token = np.random.choice(np.arange(len(probs)), p=probs)
    if token == EOS:
      break
    tokens.append(token)
  return tokens

Note that draw_sample can take inputs and can use control flow such as loops, conditionals, and recursion. There are only two constraints on the draw_sample function:

  1. It must be deterministic given the inputs, except for random choices provided by np.random.choice (or some other method of selecting a random index given a discrete probability distribution).
  2. Two different sequences of random choices must lead to draw_sample returning different outputs.

Next, add a UniqueRandomizer object as an input to draw_sample, and use its sample_distribution function to replace np.random.choice:

- def draw_sample(sequence_model, state):
+ def draw_sample(sequence_model, state, randomizer):
    """Draws a sample (a sequence of token indices) from the sequence model."""
    tokens = []
    token = BOS
    for i in range(MAX_LEN):
      probs, state = sequence_model(token, state)
-     token = np.random.choice(np.arange(len(probs)), p=probs)
+     token = randomizer.sample_distribution(probs)
      if token == EOS:
        break
      tokens.append(token)
    return tokens

Finally, a simple loop around draw_sample can collect unique samples, as follows:

def draw_unique_samples(model, state, num_samples):
  """Draws multiple unique samples from the sequence model."""
  samples = []
  randomizer = unique_randomizer.UniqueRandomizer()
  for _ in range(num_samples):
    samples.append(draw_sample(model, state, randomizer))
    randomizer.mark_sequence_complete()
  return samples

Code Samples

We include a few code samples that demonstrate how to use UniqueRandomizer:

  • examples/weighted_coin_flips.py: This provides a very simple example of using UniqueRandomizer. The function flip_two_weighted_coins simulates flipping a pair of weighted coins. The sample_flips_without_replacement function then uses UniqueRandomizer to efficiently sample outputs of flip_two_weighted_coins without replacement.

  • examples/expand_grammar.py: This defines a Probabilistic Context-Free Grammar (PCFG), as well as methods to sample elements of the grammar without replacement by using UniqueRandomizer, rejection sampling, and Stochastic Beam Search (SBS). The script examples/expand_grammar_main.py enables easy comparison between the different sampling methods under different scenarios.

  • examples/sequence_example.py: This implements sampling without replacement from a sequence model, using UniqueRandomizer, Batched UniqueRandomizer, rejection sampling, and SBS. The script examples/sequence_example_main.py enables easy comparison between the different sampling methods under different scenarios.

Disclaimer

This is not an officially supported Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unique-randomizer-0.0.1.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unique_randomizer-0.0.1-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file unique-randomizer-0.0.1.tar.gz.

File metadata

  • Download URL: unique-randomizer-0.0.1.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for unique-randomizer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 8d0e737d05fdb530eb1f751d1f7786e32ff6af53e2c1792d1613119d735cdc44
MD5 faffc18b7cf78bc4f905424268ec680a
BLAKE2b-256 61722f14fdb43f05b70e8c383160849a9ed4a9b09c08bc65db79d9b8cd4ba13b

See more details on using hashes here.

File details

Details for the file unique_randomizer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: unique_randomizer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.5

File hashes

Hashes for unique_randomizer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 58734804fd0a277f614692a6c9a751cca68c5d93a1e4b6c2113e94c945b2c163
MD5 451dd610a33b004f1ba104f28c8e66d1
BLAKE2b-256 5d75b0a9a977acee04db73fe89411f5a0b8bc98339ac60a0305439c82ac9c72a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page