Skip to main content

map and starmap implementations passing additional arguments and parallelizing if possible

Project description

https://github.com/zeehio/parmap/actions/workflows/test.yml/badge.svg conda-forge version Documentation Status https://codecov.io/github/zeehio/parmap/coverage.svg?branch=main Code Climate

This small python module implements four functions: map and starmap, and their async versions map_async and starmap_async.

What does parmap offer?

  • Provide an easy to use syntax for both map and starmap.

  • Parallelize transparently whenever possible.

  • Pass additional positional and keyword arguments to parallelized functions.

  • Show a progress bar (requires tqdm as optional package)

Installation:

pip install tqdm # for progress bar support
pip install parmap

Usage:

Here are some examples with some unparallelized code parallelized with parmap:

Simple parallelization example:

import parmap
# You want to do:
mylist = [1,2,3]
argument1 = 3.14
argument2 = True
y = [myfunction(x, argument1, mykeyword=argument2) for x in mylist]
# In parallel:
y = parmap.map(myfunction, mylist, argument1, mykeyword=argument2)

Show a progress bar:

Requires pip install tqdm

# You want to do:
y = [myfunction(x) for x in mylist]
# In parallel, with a progress bar
y = parmap.map(myfunction, mylist, pm_pbar=True)
# Passing extra options to the tqdm progress bar
y = parmap.map(myfunction, mylist, pm_pbar={"desc": "Example"})

Passing multiple arguments:

# You want to do:
z = [myfunction(x, y, argument1, argument2, mykey=argument3) for (x,y) in mylist]
# In parallel:
z = parmap.starmap(myfunction, mylist, argument1, argument2, mykey=argument3)

# You want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
    listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)

Advanced: Multiple parallel tasks running in parallel

In this example, Task1 uses 5 cores, while Task2 uses 3 cores. Both tasks start to compute simultaneously, and we print a message as soon as any of the tasks finishes, retreiving the result.

import parmap
def task1(item):
    return 2*item

def task2(item):
    return 2*item + 1

items1 = range(500000)
items2 = range(500)

with parmap.map_async(task1, items1, pm_processes=5) as result1:
    with parmap.map_async(task2, items2, pm_processes=3) as result2:
        data_task1 = None
        data_task2 = None
        task1_working = True
        task2_working = True
        while task1_working or task2_working:
            result1.wait(0.1)
            if task1_working and result1.ready():
                print("Task 1 has finished!")
                data_task1 = result1.get()
                task1_working = False
            result2.wait(0.1)
            if task2_working and result2.ready():
                print("Task 2 has finished!")
                data_task2 = result2.get()
                task2_working = False
#Further work with data_task1 or data_task2

map and starmap already exist. Why reinvent the wheel?

The existing functions have some usability limitations:

  • The built-in python function map [1] is not able to parallelize.

  • multiprocessing.Pool().map [3] does not allow any additional argument to the mapped function.

  • multiprocessing.Pool().starmap allows passing multiple arguments, but in order to pass a constant argument to the mapped function you will need to convert it to an iterator using itertools.repeat(your_parameter) [4]

parmap aims to overcome this limitations in the simplest possible way.

Additional features in parmap:

  • Create a pool for parallel computation automatically if possible.

  • parmap.map(..., ..., pm_parallel=False) # disables parallelization

  • parmap.map(..., ..., pm_processes=4) # use 4 parallel processes

  • parmap.map(..., ..., pm_pbar=True) # show a progress bar (requires tqdm)

  • parmap.map(..., ..., pm_pool=multiprocessing.Pool()) # use an existing pool, in this case parmap will not close the pool.

  • parmap.map(..., ..., pm_chunksize=3) # size of chunks (see multiprocessing.Pool().map)

Limitations:

parmap.map() and parmap.starmap() (and their async versions) have their own arguments (pm_parallel, pm_pbar…). Those arguments are never passed to the underlying function. In the following example, myfun will receive myargument, but not pm_parallel. Do not write functions that require keyword arguments starting with pm_, as parmap may need them in the future.

parmap.map(myfun, mylist, pm_parallel=True, myargument=False)

Additionally, there are other keyword arguments that should be avoided in the functions you write, because of parmap backwards compatibility reasons. The list of conflicting arguments is: parallel, chunksize, pool, processes, callback, error_callback and parmap_progress.

Acknowledgments:

This package started after this question, when I offered this answer, taking the suggestions of J.F. Sebastian for his answer

Known works using parmap

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parmap-1.7.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

parmap-1.7.0-py2.py3-none-any.whl (32.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file parmap-1.7.0.tar.gz.

File metadata

  • Download URL: parmap-1.7.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for parmap-1.7.0.tar.gz
Algorithm Hash digest
SHA256 77c45210617c3c084e073d61e3a8111e398bbc606d4e3ec03f9a0c3aadc3f47b
MD5 8e1d6005081d5debf507ba9fd1be500a
BLAKE2b-256 6aa7440ce4b53a4918773c65077ea95136890c1037adfd87065fbb2c757ea381

See more details on using hashes here.

File details

Details for the file parmap-1.7.0-py2.py3-none-any.whl.

File metadata

  • Download URL: parmap-1.7.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for parmap-1.7.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4953c7092442dec9560f9b25f9ff184006acd467980c00ab798f1644d432a595
MD5 f6de29b6a5877861045d8544246033cb
BLAKE2b-256 2d39f820a78508ec19de9209eaaf5d1d4722c96a500de8e8d52c50ef1eca0da6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page