Skip to main content

A Python package for easy multiprocessing, but faster than multiprocessing

Project description

Build status Docs status Pypi status Python versions

MPIRE, short for MultiProcessing Is Really Easy, is a Python package for multiprocessing, but faster and more user-friendly than the default multiprocessing package. It combines the convenient map like functions of multiprocessing.Pool with the benefits of using copy-on-write shared objects of multiprocessing.Process, together with easy-to-use worker state, worker insights, and progress bar functionality.

Full documentation is available at https://slimmer-ai.github.io/mpire/.

Features

  • Faster execution than other multiprocessing libraries. See benchmarks.

  • Intuitive, Pythonic syntax

  • Multiprocessing with map/map_unordered/imap/imap_unordered functions

  • Easy use of copy-on-write shared objects with a pool of workers (copy-on-write is only available for start method fork)

  • Each worker can have its own state and with convenient worker init and exit functionality this state can be easily manipulated (e.g., to load a memory-intensive model only once for each worker without the need of sending it through a queue)

  • Progress bar support using tqdm

  • Progress dashboard support

  • Worker insights to provide insight into your multiprocessing efficiency

  • Graceful and user-friendly exception handling

  • Automatic task chunking for all available map functions to speed up processing of small task queues (including numpy arrays)

  • Adjustable maximum number of active tasks to avoid memory problems

  • Automatic restarting of workers after a specified number of tasks to reduce memory footprint

  • Nested pool of workers are allowed when setting the daemon option

  • Child processes can be pinned to specific or a range of CPUs

  • Optionally utilizes dill as serialization backend through multiprocess, enabling parallelizing more exotic objects, lambdas, and functions in iPython and Jupyter notebooks.

MPIRE has been tested on both Linux and Windows. There are a few minor known caveats for Windows users, which can be found here.

Installation

Through pip (PyPi):

pip install mpire

Getting started

Suppose you have a time consuming function that receives some input and returns its results. Simple functions like these are known as embarrassingly parallel problems, functions that require little to no effort to turn into a parallel task. Parallelizing a simple function as this can be as easy as importing multiprocessing and using the multiprocessing.Pool class:

import time
from multiprocessing import Pool

def time_consuming_function(x):
    time.sleep(1)  # Simulate that this function takes long to complete
    return ...

with Pool(processes=5) as pool:
    results = pool.map(time_consuming_function, range(10))

MPIRE can be used almost as a drop-in replacement to multiprocessing. We use the mpire.WorkerPool class and call one of the available map functions:

from mpire import WorkerPool

with WorkerPool(n_jobs=5) as pool:
    results = pool.map(time_consuming_function, range(10))

The differences in code are small: there’s no need to learn a completely new multiprocessing syntax, if you’re used to vanilla multiprocessing. The additional available functionality, though, is what sets MPIRE apart.

Progress bar

Suppose we want to know the status of the current task: how many tasks are completed, how long before the work is ready? It’s as simple as setting the progress_bar parameter to True:

with WorkerPool(n_jobs=5) as pool:
    results = pool.map(time_consuming_function, range(10), progress_bar=True)

And it will output a nicely formatted tqdm progress bar. In case you’re running your code inside a notebook it will automatically switch to a widget.

MPIRE also offers a dashboard, for which you need to install additional dependencies. See Dashboard for more information.

Shared objects

Note: Copy-on-write shared objects is only available for start method fork. For threading the objects are shared as-is. For other start methods the shared objects are copied once for each worker.

If you have one or more objects that you want to share between all workers you can make use of the copy-on-write shared_objects option of MPIRE. MPIRE will pass on these objects only once for each worker without copying/serialization. Only when you alter the object in the worker function it will start copying it for that worker.

def time_consuming_function(some_object, x):
    time.sleep(1)  # Simulate that this function takes long to complete
    return ...

def main():
    some_object = ...
    with WorkerPool(n_jobs=5, shared_objects=some_object) as pool:
        results = pool.map(time_consuming_function, range(10), progress_bar=True)

See shared_objects for more details.

Worker initialization

Workers can be initialized using the worker_init feature. Together with worker_state you can load a model, or set up a database connection, etc.:

def init(worker_state):
    # Load a big dataset or model and store it in a worker specific worker_state
    worker_state['dataset'] = ...
    worker_state['model'] = ...

def task(worker_state, idx):
    # Let the model predict a specific instance of the dataset
    return worker_state['model'].predict(worker_state['dataset'][idx])

with WorkerPool(n_jobs=5, use_worker_state=True) as pool:
    results = pool.map(task, range(10), worker_init=init)

Similarly, you can use the worker_exit feature to let MPIRE call a function whenever a worker terminates. You can even let this exit function return results, which can be obtained later on. See the worker_init and worker_exit section for more information.

Worker insights

When you’re multiprocessing setup isn’t performing as you want it to and you have no clue what’s causing it, there’s the worker insights functionality. This will give you insight in your setup, but it will not profile the function you’re running (there are other libraries for that). Instead, it profiles the worker start up time, waiting time and working time. When worker init and exit functions are provided it will time those as well.

Perhaps you’re sending a lot of data over the task queue, which makes the waiting time go up. Whatever the case, you can enable and grab the insights using the enable_insights flag and mpire.WorkerPool.get_insights function, respectively:

with WorkerPool(n_jobs=5) as pool:
    results = pool.map(time_consuming_function, range(10), enable_insights=True)
    insights = pool.get_insights()

See worker insights for a more detailed example and expected output.

Benchmarks

MPIRE has been benchmarked on three different benchmarks: numerical computation, stateful computation, and expensive initialization. More details on these benchmarks can be found in this blog post. All code for these benchmarks can be found in this project.

The following graph shows the average normalized results of all three benchmarks. Results for individual benchmarks can be found in the blog post. The benchmarks were run on a Linux machine with 20 cores, with disabled hyperthreading and 200GB of RAM. For each task, experiments were run with different numbers of processes/workers and results were averaged over 5 runs.

Average normalized bechmark results

Documentation

See the full documentation at https://slimmer-ai.github.io/mpire/ for information on all the other features of MPIRE.

If you want to build the documentation yourself, please install the documentation dependencies by executing:

pip install mpire[docs]

or

pip install .[docs]

Documentation can then be build by using Python <= 3.9 and executing:

python setup.py build_docs

Documentation can also be build from the docs folder directly. In that case MPIRE should be installed and available in your current working environment. Then execute:

make html

in the docs folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mpire-2.3.1.tar.gz (271.1 kB view details)

Uploaded Source

Built Distribution

mpire-2.3.1-py3-none-any.whl (283.1 kB view details)

Uploaded Python 3

File details

Details for the file mpire-2.3.1.tar.gz.

File metadata

  • Download URL: mpire-2.3.1.tar.gz
  • Upload date:
  • Size: 271.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for mpire-2.3.1.tar.gz
Algorithm Hash digest
SHA256 695793ad0cc4e4e62e30b0af49602cec4eddd06faeb5aca8bf01755e73479286
MD5 04da715e571cfcbec70432f3673a03fb
BLAKE2b-256 ef19588420feac19cc954542b77645e7e9e65b769c61bc3e7c9c4fbaff469f09

See more details on using hashes here.

File details

Details for the file mpire-2.3.1-py3-none-any.whl.

File metadata

  • Download URL: mpire-2.3.1-py3-none-any.whl
  • Upload date:
  • Size: 283.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.8

File hashes

Hashes for mpire-2.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d6d830c29625a364081817ad2e9dcba3b5f7e069753e99ce4905a3540b08928
MD5 07fdd423b19b76e6de7fdb7ec1f4836c
BLAKE2b-256 e6f2dadecdafc8586c556ce9efd9edbbc8ba81cf61cc319209f18fe075664562

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page