Skip to main content

A command line tool to randomly sample k items from an input S containing n items.

Project description

reservoir-sampling-cli
======================

A command line tool to randomly sample k items from an input S containing n items.

> Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number.
> --<cite><http://en.wikipedia.org/wiki/Reservoir_sampling></cite>

Installation
------------

pip install -e git+ssh://git@github.com/RyanBalfanz/preservoir-sampling-cli.git#egg=resamp

Usage
-----

Show help message

$ resamp -h
usage: Randomly sample k items from an input S containing n items.
[-h] [-k NUM_ITEMS] [--preserve-order]
[infile] [outfile]

positional arguments:
infile
outfile

optional arguments:
-h, --help show this help message and exit
-k NUM_ITEMS, --num-items NUM_ITEMS
An integer number giving the size of the reservoir
--preserve-order Preserve input ordering

Sample 10 words from /usr/share/dict/words preserving the original order

$ cat /usr/share/dict/words | resamp -k10 --preserve-order
Paralipomenon
frankalmoign
hauntingly
hellion
laniiform
lithify
semicollapsible
sniveled
stolkjaerre
unaloud

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for reservoir-sampling-cli, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size reservoir_sampling_cli-0.1-py27-none-any.whl (3.7 kB) File type Wheel Python version 2.7 Upload date Hashes View
Filename, size reservoir-sampling-cli-0.1.tar.gz (1.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page