Skip to main content

A set of map-reduce high-order functions to use with parallel or xargs

Project description

SHMR

A set of high-order map-reduce functions

PyPI Python GitHub Issues Contributions welcome License

Table of Contents

Installation

From PyPi: pip install shmr

Features

This library is designed to work with xargs or parallel for paralleling processing large data as simple as possible. Its main goal is to reduce the time spending writing code with respect to reasonable computing speed up by doing parallelization (i.e., not trying to be as fast as possible, but still faster than sequential algorithms). It is more suitable to research environment than production environment as existing parallel computing frameworks.

Its API is highly influent by Spark API.

Below are some examples:

  1. Split one file (partition) to multiple files (partitions)
python -m shmr -i <file_path> partitions.coalesce --outfile <output_files> --num_partitions=128
  1. Parallel applying a mapping function
ls <input_files> | xargs -n 1 -I {} -P <n_threads> python -m shmr \
    -i {} partition.map --fn <func> --outfile <output_file>

If you provide the -v, it will show the progression bar telling you how long it will take to process one partition.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shmr-1.0.9.tar.gz (8.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page