Skip to main content

A command line tool to randomly sample k items from an input S containing n items.

Project description

reservoir-sampling-cli
======================

A command line tool to randomly sample k items from an input S containing n items.

> Reservoir sampling is a family of randomized algorithms for randomly choosing a sample of k items from a list S containing n items, where n is either a very large or unknown number.
> --<cite><http://en.wikipedia.org/wiki/Reservoir_sampling></cite>

Installation
------------

pip install -e git+ssh://git@github.com/RyanBalfanz/preservoir-sampling-cli.git#egg=resamp

Usage
-----

Show help message

$ resamp -h
usage: Randomly sample k items from an input S containing n items.
[-h] [-k NUM_ITEMS] [--preserve-order]
[infile] [outfile]

positional arguments:
infile
outfile

optional arguments:
-h, --help show this help message and exit
-k NUM_ITEMS, --num-items NUM_ITEMS
An integer number giving the size of the reservoir
--preserve-order Preserve input ordering

Sample 10 words from /usr/share/dict/words preserving the original order

$ cat /usr/share/dict/words | resamp -k10 --preserve-order
Paralipomenon
frankalmoign
hauntingly
hellion
laniiform
lithify
semicollapsible
sniveled
stolkjaerre
unaloud

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reservoir-sampling-cli-0.1.tar.gz (1.8 kB view details)

Uploaded Source

Built Distribution

reservoir_sampling_cli-0.1-py27-none-any.whl (3.7 kB view details)

Uploaded Python 2.7

File details

Details for the file reservoir-sampling-cli-0.1.tar.gz.

File metadata

File hashes

Hashes for reservoir-sampling-cli-0.1.tar.gz
Algorithm Hash digest
SHA256 e901fe41b6f9b407fa116ac8b4df38518ec8203907f5537f318d67c662a9bf12
MD5 a5e0706e62b22c9bd0e9154b1cabd66a
BLAKE2b-256 ee86fef0d5af1038e2cfa1d766da67e68ad9bc43b779fce0a0b30ee27edfd7c0

See more details on using hashes here.

File details

Details for the file reservoir_sampling_cli-0.1-py27-none-any.whl.

File metadata

File hashes

Hashes for reservoir_sampling_cli-0.1-py27-none-any.whl
Algorithm Hash digest
SHA256 c17e1239c16283420ce57f5d219e73384d42aab809e8de04332f327f677c6ea5
MD5 5533ad22e4671aa82d5d72364dfc1fd7
BLAKE2b-256 2f44c4ecbd9528bf5f2937c3ffb133192433d3056e4af4aaddefae94c698f514

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page