Skip to main content

Python tools to sample randomly with dont pick closest `n` elements constraints. Also contains a batch generator for the same to sample with replacement and with repeats if necessary.

Project description

Sampling Utils

Pypi Version Pypi Licence Pypi Wheel

Python tools to sample randomly with dont pick closest n elements constraints. Also contains a batch generator for the same to sample with replacement and with repeats if necessary.

Installation

Simply install using pip

pip install sampling_utils

Usage

Dont Pick Closest

from sampling_utils import sample_from_list
sample_from_list([1,2,3,4,5,6,7,8], dont_pick_closest=2)

You are guaranteed to get samples that are at least dont_pick_closest apart# (in value, not in indices). Here you will get samples where sample - any_other_sample is always greater than 2.

For example, if 2 is picked, no other item in range [2+dont_pick_closest and 2-dont_pick_closest] will be picked

Another example looped 5 times:

for _ in range(5):
    sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2)

# Output
# [5, 10, 2, 14]
# [9, 6, 14, 1]
# [3, 8, 12]
# [10, 3, 6, 14]
# [2, 5, 8, 12]

If 12 is sampled, sampling 10 and 14 are not allowed since dont_pick_closest is 2. In other words, if n is sampled, then sampling anything from [n-dont_pick_closest, ... n-1, n , n+1, ... n+dont_pick_closest] is not allowed (if present in the list).

#Will be called as dont_pick_closest rule hereafter.

Number of samples

You can also specify how many samples you want from the list using number_of_samples parameter. By default, you get maximum possible samples (without replacement).

for _ in range(5):
    sample_from_list([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2, num_samples=2)

# Output
# [8, 2]
# [6, 3]
# [12, 1]
# [4, 10]
# [9, 1]

If you try to sample more than what's possible, you will get an error saying that it's not possible.

Min and max samples

You may want to just know how much you can sample from a given list obeying the dont_pick_closest rule

from sampling_utils import get_min_samples, get_max_samples
print(get_min_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))
print(get_max_samples([1,2,3,4,5,6,8,9,10,12,14], dont_pick_closest=2))

# Output
# Min 3
# Max 4

Sampling without replacement successively / Generating batches of samples for one epoch

If you want to successively sample without replacement i.e. sample as many samples from the list without repeating, you can use batch_rand_generator as shown below. This is particularly useful to generate batches of data until no more batches can be generated (equivalent to one epoch).

from sampling_utils import batch_rand_generator 
from sampling_utils import get_batch_generator_elements

batch_size = 2
brg = batch_rand_generator([1,2,3,4,5,6,8,9,10,12,14], batch_size=batch_size, dont_pick_closest=2)
print(get_batch_generator_elements(brg, batch_size=batch_size))
# Output
# [[1, 4], [8, 5], [14, 3], [2, 6]]

Notice that the elements

  • within each batch obey the dont_pick_closest rule (e.g. 1 and 4 from batch 1)
  • from different batches need not obey the rule (e.g. 4 and 5 from batch 1 and 2 respectively).

Contributing

Pull requests are very welcome.

  1. Fork the repo
  2. Create new branch with feature name as branch name
  3. Check if things work with a jupyter notebook
  4. Raise a pull request

Licence

Please see attached Licence

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sampling_utils-0.1.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sampling_utils-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file sampling_utils-0.1.1.tar.gz.

File metadata

  • Download URL: sampling_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sampling_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7344b801a35465ddf0c7aaa32f7f47b1c8d6bd40e4e57a172d358048549ed8d0
MD5 a094d94e8e880be408af3fbc507f6522
BLAKE2b-256 e3bd20b1b1ef7ccdf6dd20a9392c19ae775f6557bcf6b2b53fb61452776b219a

See more details on using hashes here.

File details

Details for the file sampling_utils-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sampling_utils-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for sampling_utils-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7595e5ccb031f1af38766f43d9c1672d659332d7ca983bccf7308326110c05e7
MD5 34f13ab76299605fbde89345a58c3a2d
BLAKE2b-256 e722c39a88e3fedd761d17371867d90e30cb355062ce728efe307a9bf70b2e66

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page