A package for easy task parallelization across CPU threads
Project description
mpyll
mpyll is a package for easy task parallelization across CPU threads.
Installation
pip install mpyll
Usage
mpyll logic is as follows:
- Identify the data on which to parallelize computation. The data should be stored in a list.
- Define the task: a python function that takes as input a list of data elements and performs the desired task. This is the parallelized task; instances of this function are to be running in CPU threads.
- Define an eventual post processing function that takes as input a list of data and returns the final result, if any.
Example
Let's take as an example the estimation of Pi through Monte Carlo:
import numpy as np
from mpyll import parallelize
# First, we define the data on which we would like to parallelize computation.
r = 1.
m = 10 ** 6
X = np.random.uniform(-r, r, size = m)
Y = np.random.uniform(-r, r, size = m)
data = [(X[i], Y[i]) for i in range(m)]
# Second, we define the task to be parallelized.
# It takes as input the data (a list) as well as other arguments, if any,
# and it returns a result. If it is a procedure, then it does not return.
def count_in_circle_points(data, r, m):
a = np.array(data) # matrix, each row contains a point coordinates
d = np.sqrt(np.sum(a ** 2, axis = 1)) # distance to the origin
in_circle = d <= r # an array, True if distance <= radius, False otherwise
return np.sum(in_circle)
# Finally, we define a post processor.
def estimate_pi(data, m):
pi_estimation = 4 * np.sum(data) / m
return pi_estimation
pi_estimation = parallelize(task = count_in_circle_points,
data = data, data_shuffle = False,
post_processor = estimate_pi,
n_jobs = -1,
# task arguments
count_in_circle_points_r = r,
count_in_circle_points_m = m,
# post processor arguments
estimate_pi_m = m)
API
parallelize(task,
data,
shuffle_data = False,
post_processor = None,
n_jobs = -1,
*args,
**kwargs)
Parallelize a task that returns a value
Parameters
----------
task: function
The task to be parallelized.
data: list
The data on which the parallelization is performed.
shuffle_data: boolean
shuffle data before processing. Sometimes the data are not identically
distributed, which could cause some threads to be overloaded compared to
others.
post_processor: function
A function that runs after all threads terminate.
n_jobs: int
The number of threads to be used. Specify -1 to use all CPU threads.
Other Parameters
----------------
Other parameters could be passed to `task` and `post_processor`. The argument
name should start with the name of the task or the post processor, followed
by an underscore and the name of the argument.
Returns
-------
If a post processor is specified, then this function returns what is returned
by the post processor, otherwise, it returns a list of the objects returned by
each thread.
License
GNU General Public License v3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
mpyll-0.2-py3-none-any.whl
(16.2 kB
view details)
File details
Details for the file mpyll-0.2-py3-none-any.whl
.
File metadata
- Download URL: mpyll-0.2-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e48186a4a84f89f917c2c7c4f099544f713374eb55d05b674e03ac4dbd47145 |
|
MD5 | 312e0b70946b20d8eab14587813f7bfd |
|
BLAKE2b-256 | 4002219c428b5eb46f0adb4902386753e3924127b6299e05ac61a8ab8b210f77 |