Skip to main content

reservoir sampling with or without weight from a stream of data

Project description

This module brings the effective reservoir sampling method, with or without weight. The reservoir sampling is used when you have a very large and unknown dataset of size N, and you want to sampling a subset of k of these N samples, with one stream or one file reading.

If the weight is not present, each sample will have equal chance to be selected in the final subset; if weight is used, each sample will be selected according to their weights.

# to install
pip install weightreservoir
# to use as a module in python
from weightreservoir import reservoir
# to use uniform sampling

uniform = reservoir.UniformSampling(size = 10)

# to add one item into the stream and decide to sample it or not uniform.addOne(itemValue)

# to add a list of items into the stream and decide to sample each of them or not uniform.addAll(itemValueList)

# to get all the current items of the sampled dataset, returned as a list uniform.get()

# to use weighted sampling

weight_sample = reservoir.WeightSampling(size = 10)

# to add one item into the stream and decide to sample it or not by its weight weight_sample.addOne(itemValue, itemWeight)

# to add a list of items into the stream and decide to sample each of them or not by their weight weight_sample.addAll(itemValueList, itemWeightList)

# to get all the current items of the sampled dataset, returned as a list weight_sample.get()

Project details


Release history Release notifications

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for weightreservoir, version 1.0
Filename, size File type Python version Upload date Hashes
Filename, size weightreservoir-1.0.tar.gz (2.0 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page