Skip to main content

A command line tool to randomly sample k items from an input S containing n items.

Project description

reservoir-sampling-cli
======================

A command line tool to randomly sample k items from an input S containing n items.

> Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number.
> --<cite><http://en.wikipedia.org/wiki/Reservoir_sampling></cite>

Installation
------------

pip install -e git+ssh://git@github.com/RyanBalfanz/preservoir-sampling-cli.git#egg=resamp

Usage
-----

Show help message

$ resamp -h
usage: Randomly sample k items from an input S containing n items.
[-h] [-k NUM_ITEMS] [--preserve-order]
[infile] [outfile]

positional arguments:
infile
outfile

optional arguments:
-h, --help show this help message and exit
-k NUM_ITEMS, --num-items NUM_ITEMS
An integer number giving the size of the reservoir
--preserve-order Preserve input ordering

Sample 10 words from /usr/share/dict/words preserving the original order

$ cat /usr/share/dict/words | resamp -k10 --preserve-order
Paralipomenon
frankalmoign
hauntingly
hellion
laniiform
lithify
semicollapsible
sniveled
stolkjaerre
unaloud

Project details


Release history Release notifications

This version
History Node

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
reservoir_sampling_cli-0.1-py27-none-any.whl (3.7 kB) Copy SHA256 hash SHA256 Wheel 2.7
reservoir-sampling-cli-0.1.tar.gz (1.8 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page