Skip to main content

Package for consistent sampling with or without replacement.

Project description

consistent_sampler

Routine sampler for providing 'consistent sampling' --- sampling that is consistent across subsets. Consistent sampling works by associating a random number with each element; the desired sample is found by taking the subset of the desired sample size containing those elements with the smallest associated random numbers.

The sampling is consistent since it consistently favors elements with small associated random numbers; if two sets S and T have substantial overlap, then their samples of a given size will also have substantial overlap (for the same random seed).

This routine is intended for use in election audits, where the objects being sampled are ballots, but this procedure is for general use. For a similar election audit sampling method, see Stark's election audit tools: https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm

This routine takes as input a finite collection of distinct object ids, a random seed, and some other parameters. The sampling may be "with replacement" or "without replacement". One of the additional parameters to the routine is "take" -- the size of the desired sample.

It provides as output a "sampling order" --- an ordered list of object ids that determine the sample. For sampling without replacement, the output can not be longer than the input, as no object may appear in the sample more than once. For sampling with replacement, the output may be infinite in length, as an object may appear in the sample an arbitrarily large (even infinite) number of times. The output of sampler is therefore always a python generator, capable of producing an infinitely long stream of output object ids.

As a small example of sampling without replacement:

g = sampler(['A-1', 'A-2', 'A-3', 'B-1', 'B-2', 'B-3'], 
            with_replacement=False, take=4, seed=314159, output='id')

yields a generator g whose output can be printed:

print(list(id for id in g))

which produces:

['B-2', 'B-3', 'A-3', 'A-2']

Consistent sampling is not a new idea, see for example https://arxiv.org/abs/1612.01041 and the references to consistent sampling therein.

The routine here may (or may not) be novel in that it extends consistent sampling to sampling with replacement: when an item is sampled and then replaced in the set of items being sampled, it is given a new random number drawn uniformly from the set of numbers in (0, 1) larger than its previous associated number. To implement this efficiently and portably, we represent a number in (0, 1) as a variable-length decimal string of the form '0.dddddd...' .

For our applications, one big advantage of consistent sampling is the following. If each county collects cast ballots separately, then they can order their own ballots for sampling and interpretation independently of what other counties are doing. An overall sample can be constructed from the individual county samples.

Further documentation and examples are in the code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consistentsamplerpkg-1.0.2.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

consistentsamplerpkg-1.0.2-py3-none-any.whl (2.8 kB view details)

Uploaded Python 3

File details

Details for the file consistentsamplerpkg-1.0.2.tar.gz.

File metadata

  • Download URL: consistentsamplerpkg-1.0.2.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.1

File hashes

Hashes for consistentsamplerpkg-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ee0908b978e7f874fbb7f024a96214e7ca29228ab3f84ccd4a5824be489cc861
MD5 6042f032ce1d51058baa7ca8d9e6267e
BLAKE2b-256 f8f18e5d15380153856f62bcd63713e8f6e8757892aed726aa1108e8920599aa

See more details on using hashes here.

File details

Details for the file consistentsamplerpkg-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: consistentsamplerpkg-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.1

File hashes

Hashes for consistentsamplerpkg-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 074d5d0669350e3392c3a974f14d38b900c31eb7005913767978624e0f99557a
MD5 4576355f5e17823af90e75d5114f0132
BLAKE2b-256 a7feb1a80ef801c54584826fc709109bee3b5f63d01fcfdc3c9ba469f4176715

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page