reservoir sampling with or without weight from a stream of data
Project description
This module brings the effective reservoir sampling method, with or without weight. The reservoir sampling is used when you have a very large and unknown dataset of size N, and you want to sampling a subset of k of these N samples, with one stream or one file reading.
If the weight is not present, each sample will have equal chance to be selected in the final subset; if weight is used, each sample will be selected according to their weights.
- # to install
pip install weightreservoir
- # to use as a module in python
from weightreservoir import reservoir
- # to use uniform sampling
uniform = reservoir.UniformSampling(size = 10)
# to add one item into the stream and decide to sample it or not uniform.addOne(itemValue)
# to add a list of items into the stream and decide to sample each of them or not uniform.addAll(itemValueList)
# to get all the current items of the sampled dataset, returned as a list uniform.get()
- # to use weighted sampling
weight_sample = reservoir.WeightSampling(size = 10)
# to add one item into the stream and decide to sample it or not by its weight weight_sample.addOne(itemValue, itemWeight)
# to add a list of items into the stream and decide to sample each of them or not by their weight weight_sample.addAll(itemValueList, itemWeightList)
# to get all the current items of the sampled dataset, returned as a list weight_sample.get()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file weightreservoir-1.0.tar.gz
.
File metadata
- Download URL: weightreservoir-1.0.tar.gz
- Upload date:
- Size: 2.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22ccce1186ad0f85dc66a5a0515e6147076951c165640b9e29ddce4d8105dbd5 |
|
MD5 | f2f45141af4d8831e48bf1f4b98c5eea |
|
BLAKE2b-256 | 9f925d6b7b3820a99b06b93ce2d490a56e30f9ac512cca059208d7735f3b3a1f |