Data sampler from streaming data
StreamSampler package allows you to sample a particular number of elements from a stream of data of which length is very large or unknown.
StreamSampler is provided in both forms of an executable command and library. It utilizes Reservoir sampling algorithm [Vitter85]
You can take a look at the README.txt of other projects, such as repoze.bfg (http://bfg.repoze.org/trac/browser/trunk/README.txt) for some ideas.
- sample-cli by Paul Butler is a command line tool providing almost the same feature. StreamSampler is intended to be a library, although it has a command line interface, so that it can be a part of other packages including my future projects.
- Tests in Python 2.6, 2.7, 3.1, 3.2, 3.3
First public version