Skip to main content

map and starmap implementations passing additional arguments and parallelizing if possible

Project description

parmap
======

.. image:: https://travis-ci.org/zeehio/parmap.svg?branch=master
:target: https://travis-ci.org/zeehio/parmap

.. image:: https://readthedocs.org/projects/parmap/badge/?version=latest
:target: https://readthedocs.org/projects/parmap/?badge=latest
:alt: Documentation Status

.. image:: https://codecov.io/github/zeehio/parmap/coverage.svg?branch=master
:target: https://codecov.io/github/zeehio/parmap?branch=master

.. image:: https://codeclimate.com/github/zeehio/parmap/badges/gpa.svg
:target: https://codeclimate.com/github/zeehio/parmap
:alt: Code Climate

.. image:: https://img.shields.io/pypi/dm/parmap.svg
:target: https://pypi.python.org/pypi/parmap
:alt: Pypi downloads per month

This small python module implements two functions: ``map`` and
``starmap``.

What does parmap offer?
-----------------------

- Provide an easy to use syntax for both ``map`` and ``starmap``.
- Parallelize transparently whenever possible.
- Handle multiple arguments, even keyword arguments!
- Show a progress bar (requires `tqdm` as optional package)

Installation:
-------------

::

 pip install tqdm # for progress bar support
pip install parmap


Usage:
------

Here are some examples with some unparallelized code parallelized with
parmap:

Simple parallelization example:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

import parmap
# You want to do:
mylist = [1,2,3]
argument1 = 3.14
argument2 = True
y = [myfunction(x, argument1, mykeyword=argument2) for x in mylist]
# In parallel:
y = parmap.map(myfunction, mylist, argument1, mykeyword=argument2)


Show a progress bar:
~~~~~~~~~~~~~~~~~~~~~

Requires ``pip install tqdm``

::

# You want to do:
y = [myfunction(x) for x in mylist]
# In parallel, with a progress bar
y = parmap.map(myfunction, mylist, pm_pbar=True)


Passing multiple arguments:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

::

# You want to do:
z = [myfunction(x, y, argument1, argument2, mykey=argument3) for (x,y) in mylist]
# In parallel:
z = parmap.starmap(myfunction, mylist, argument1, argument2, mykey=argument3)

# You want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)


Advanced: Multiple parallel tasks running in parallel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this example, Task1 uses 5 cores, while Task2 uses 3 cores. Both tasks start
to compute simultaneously, and we print a message as soon as any of the tasks
finishes, retreiving the result.

::

import parmap
def task1(item):
return 2*item

def task2(item):
return 2*item + 1

items1 = range(500000)
items2 = range(500)

with parmap.map_async(task1, items1, pm_processes=5) as result1:
with parmap.map_async(task2, items2, pm_processes=3) as result2:
data_task1 = None
data_task2 = None
task1_working = True
task2_working = True
while task1_working or task2_working:
result1.wait(0.1)
if task1_working and result1.ready():
print("Task 1 has finished!")
data_task1 = result1.get()
task1_working = False
result2.wait(0.1)
if task2_working and result2.ready():
print("Task 2 has finished!")
data_task2 = result2.get()
task2_working = False
#Further work with data_task1 or data_task2


map and starmap already exist. Why reinvent the wheel?
---------------------------------------------------------

The existing functions have some usability limitations:

- The built-in python function ``map`` [#builtin-map]_
is not able to parallelize.
- ``multiprocessing.Pool().starmap`` [#multiproc-starmap]_
is only available in python-3.3 and later versions.
- ``multiprocessing.Pool().map`` [#multiproc-map]_
does not allow any additional argument to the mapped function.
- ``multiprocessing.Pool().starmap`` allows passing multiple arguments,
but in order to pass a constant argument to the mapped function you
will need to convert it to an iterator using
``itertools.repeat(your_parameter)`` [#itertools-repeat]_

``parmap`` aims to overcome this limitations in the simplest possible way.

Additional features in parmap:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Create a pool for parallel computation automatically if possible.
- ``parmap.map(..., ..., pm_parallel=False)`` # disables parallelization
- ``parmap.map(..., ..., pm_processes=4)`` # use 4 parallel processes
- ``parmap.map(..., ..., pm_pbar=True)`` # show a progress bar (requires tqdm)
- ``parmap.map(..., ..., pm_pool=multiprocessing.Pool())`` # use an existing
pool, in this case parmap will not close the pool.
- ``parmap.map(..., ..., pm_chunksize=3)`` # size of chunks (see
multiprocessing.Pool().map)

Limitations:
-------------

``parmap.map()`` and ``parmap.starmap()`` (and their async versions) have their own
arguments (``pm_parallel``, ``pm_pbar``...). Those arguments are never passed
to the underlying function. In the following example, ``myfun`` will receive
``myargument``, but not ``pm_parallel``. Do not write functions that require
keyword arguments starting with ``pm_``, as ``parmap`` may need them in the future.

::

parmap.map(myfun, mylist, pm_parallel=True, myargument=False)

Additionally, there are other keyword arguments that should be avoided in the
functions you write, because of parmap backwards compatibility reasons. The list
of conflicting arguments is: ``parallel``, ``chunksize``, ``pool``,
``processes``, ``callback``, ``error_callback`` and ``parmap_progress``.



Acknowledgments:
----------------

This package started after `this question <https://stackoverflow.com/q/5442910/446149>`_,
when I offered this `answer <http://stackoverflow.com/a/21292849/446149>`_,
taking the suggestions of J.F. Sebastian for his `answer <http://stackoverflow.com/a/5443941/446149>`_

Known works using parmap
---------------------------

- Davide Gerosa, Michael Kesden, "PRECESSION. Dynamics of spinning black-hole
binaries with python." `arXiv:1605.01067 <https://arxiv.org/abs/1605.01067>`_, 2016

References
-----------

.. [#builtin-map] http://docs.python.org/dev/library/functions.html#map
.. [#multiproc-starmap] http://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.starmap
.. [#multiproc-map] http://docs.python.org/dev/library/multiprocessing.html#multiprocessing.pool.Pool.map
.. [#itertools-repeat] http://docs.python.org/2/library/itertools.html#itertools.repeat

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parmap-1.5.0.tar.gz (20.7 kB view hashes)

Uploaded Source

Built Distribution

parmap-1.5.0-py2.py3-none-any.whl (11.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page