Skip to main content

reservoir sampling with or without weight from a stream of data

Project description

This module brings the effective reservoir sampling method, with or without weight. The reservoir sampling is used when you have a very large and unknown dataset of size N, and you want to sampling a subset of k of these N samples, with one stream or one file reading.

If the weight is not present, each sample will have equal chance to be selected in the final subset; if weight is used, each sample will be selected according to their weights.

# to install

pip install weightreservoir

# to use as a module in python

from weightreservoir import reservoir

# to use uniform sampling

uniform = reservoir.UniformSampling(size = 10)

# to add one item into the stream and decide to sample it or not uniform.addOne(itemValue)

# to add a list of items into the stream and decide to sample each of them or not uniform.addAll(itemValueList)

# to get all the current items of the sampled dataset, returned as a list uniform.get()

# to use weighted sampling

weight_sample = reservoir.WeightSampling(size = 10)

# to add one item into the stream and decide to sample it or not by its weight weight_sample.addOne(itemValue, itemWeight)

# to add a list of items into the stream and decide to sample each of them or not by their weight weight_sample.addAll(itemValueList, itemWeightList)

# to get all the current items of the sampled dataset, returned as a list weight_sample.get()

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weightreservoir-1.0.tar.gz (2.0 kB view details)

Uploaded Source

File details

Details for the file weightreservoir-1.0.tar.gz.

File metadata

File hashes

Hashes for weightreservoir-1.0.tar.gz
Algorithm Hash digest
SHA256 22ccce1186ad0f85dc66a5a0515e6147076951c165640b9e29ddce4d8105dbd5
MD5 f2f45141af4d8831e48bf1f4b98c5eea
BLAKE2b-256 9f925d6b7b3820a99b06b93ce2d490a56e30f9ac512cca059208d7735f3b3a1f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page