A set of map-reduce high-order functions to use with parallel or xargs
Project description
SHMR
A set of high-order map-reduce functions
Table of Contents
Installation
From PyPi: pip install shmr
Features
This library is designed to work with xargs or parallel for paralleling processing large data as simple as possible. Its main goal is to reduce the time spending writing code with respect to reasonable computing speed up by doing parallelization (i.e., not trying to be as fast as possible, but still faster than sequential algorithms). It is more suitable to research environment than production environment as existing parallel computing frameworks.
Its API is highly influent by Spark API.
Below are some examples:
- Split one file (partition) to multiple files (partitions)
python -m shmr -i <file_path> partitions.coalesce --outfile <output_files> --num_partitions=128
- Parallel applying a mapping function
ls <input_files> | xargs -n 1 -I {} -P <n_threads> python -m shmr \
-i {} partition.map --fn <func> --outfile <output_file>
If you provide the -v
, it will show the progression bar telling you how long it will take to process one partition.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.